{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T06:07:48Z","timestamp":1767852468959,"version":"3.49.0"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2019,6,14]],"date-time":"2019-06-14T00:00:00Z","timestamp":1560470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>The 2018 National NLP Clinical Challenge (2018 n2c2) focused on the task of cohort selection for clinical trials, where participating systems were tasked with analyzing longitudinal patient records to determine if the patients met or did not meet any of the 13 selection criteria. This article describes our participation in this shared task.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>We followed a hybrid approach combining pattern-based, knowledge-intensive, and feature weighting techniques. After preprocessing the notes using publicly available natural language processing tools, we developed individual criterion-specific components that relied on collecting knowledge resources relevant for these criteria and pattern-based and weighting approaches to identify \u201cmet\u201d and \u201cnot met\u201d cases.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>As part of the 2018 n2c2 challenge, 3 runs were submitted. The overall micro-averaged F1 on the training set was 0.9444. On the test set, the micro-averaged F1 for the 3 submitted runs were 0.9075, 0.9065, and 0.9056. The best run was placed second in the overall challenge and all 3 runs were statistically similar to the top-ranked system. A reimplemented system achieved the best overall F1 of 0.9111 on the test set.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>We highlight the need for a focused resource-intensive effort to address the class imbalance in the cohort selection identification task.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>Our hybrid approach was able to identify all selection criteria with high F1 performance on both training and test sets. Based on our participation in the 2018 n2c2 task, we conclude that there is merit in continuing a focused criterion-specific analysis and developing appropriate knowledge resources to build a quality cohort selection system.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocz079","type":"journal-article","created":{"date-parts":[[2019,5,1]],"date-time":"2019-05-01T19:12:19Z","timestamp":1556737939000},"page":"1172-1180","source":"Crossref","is-referenced-by-count":23,"title":["Hybrid bag of approaches to characterize selection criteria for cohort identification"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3122-1936","authenticated-orcid":false,"given":"V G Vinod","family":"Vydiswaran","sequence":"first","affiliation":[{"name":"Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA"},{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Asher","family":"Strayhorn","sequence":"additional","affiliation":[{"name":"Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xinyan","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Phil","family":"Robinson","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mahesh","family":"Agarwal","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, College of Arts, Sciences, and Letters, University of Michigan-Dearborn, Dearborn, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Erin","family":"Bagazinski","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Madia","family":"Essiet","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bradley E","family":"Iott","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hyeon","family":"Joo","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"PingJui","family":"Ko","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dahee","family":"Lee","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jin Xiu","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinghui","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adharsh","family":"Murali","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Koki","family":"Sasagawa","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianshi","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nalingna","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Information, University of Michigan, Ann Arbor, Michigan, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,6,14]]},"reference":[{"key":"2021012411195959500_ocz079-B1"},{"key":"2021012411195959500_ocz079-B2","author":"Uzuner","year":"2018"},{"key":"2021012411195959500_ocz079-B3","doi-asserted-by":"crossref","DOI":"10.1093\/jamia\/ocz163","article-title":"Cohort selection for clinical trials: n2c2 2018 shared task track 1","author":"Stubbs","year":"2019","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2021012411195959500_ocz079-B4","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1055\/s-0038-1638592","article-title":"Extracting information from textual documents in the electronic health record: a review of recent research","volume":"17","author":"Meystre","year":"2008","journal-title":"Yearb Med Inform"},{"issue":"3","key":"2021012411195959500_ocz079-B5","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.cct.2010.03.005","article-title":"Automated matching software for clinical trials eligibility: measuring efficiency and flexibility","volume":"31","author":"Penberthy","year":"2010","journal-title":"Contemporary Clinical Trials"},{"key":"2021012411195959500_ocz079-B6","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1007\/978-3-319-43742-2_28","volume-title":"Secondary Analysis of Electronic Health Records","author":"Sarmiento","year":"2016"},{"key":"2021012411195959500_ocz079-B7","doi-asserted-by":"crossref","first-page":"A1359","DOI":"10.1016\/S0735-1097(14)61359-0","article-title":"Natural language processing improves phenotypic accuracy in an electronic medical record cohort of type 2 diabetes and cardiovascular disease","volume":"63 (suppl 12)","author":"Kumar","year":"2014","journal-title":"J Am Coll Cardiol"},{"issue":"6","key":"2021012411195959500_ocz079-B8","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1093\/aje\/kwt441","article-title":"Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence","volume":"179","author":"Carrell","year":"2014","journal-title":"Am J Epidemiol"},{"issue":"2","key":"2021012411195959500_ocz079-B9","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1136\/amiajnl-2013-001935","article-title":"A review of approaches to identifying patient phenotype cohorts using electronic health records","volume":"21","author":"Shivade","year":"2014","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195959500_ocz079-B10","doi-asserted-by":"crossref","first-page":"S63","DOI":"10.1016\/j.jbi.2011.10.013","article-title":"A bootstrapping algorithm to improve cohort identification using structured data","volume":"44","author":"Kandula","year":"2011","journal-title":"J Biomed Inform"},{"issue":"7","key":"2021012411195959500_ocz079-B11","doi-asserted-by":"crossref","first-page":"e2626.","DOI":"10.1371\/journal.pone.0002626","article-title":"Automated identification of acute hepatitis B using electronic medical record data to facilitate public health surveillance","volume":"3","author":"Klompas","year":"2008","journal-title":"PLOS One"},{"issue":"9","key":"2021012411195959500_ocz079-B12","doi-asserted-by":"crossref","first-page":"1612.","DOI":"10.3201\/eid1009.030978","article-title":"Computer algorithms to detect bloodstream infections","volume":"10","author":"Trick","year":"2004","journal-title":"Emerg Infect Dis"},{"issue":"6","key":"2021012411195959500_ocz079-B13","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1177\/106286069901400607","article-title":"Identifying persons with diabetes using Medicare claims data","volume":"14","author":"Hebert","year":"1999","journal-title":"Am J Med Qual"},{"issue":"6","key":"2021012411195959500_ocz079-B14","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1136\/amiajnl-2011-000121","article-title":"A method and knowledge base for automated inference of patient problems from structured data in an electronic medical record","volume":"18","author":"Wright","year":"2011","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2021012411195959500_ocz079-B15","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1016\/j.jbi.2011.05.004","article-title":"Combining PubMed knowledge and EHR data to develop a weighted Bayesian network for pancreatic cancer prediction","volume":"44","author":"Zhao","year":"2011","journal-title":"J Biomed Inform"},{"key":"2021012411195959500_ocz079-B16","first-page":"838","article-title":"Survival prediction and treatment recommendation with Bayesian techniques in lung cancer","volume":"2012","author":"Sesen","year":"2012","journal-title":"AMIA Annu Symp Proc"},{"key":"2021012411195959500_ocz079-B17","first-page":"436","article-title":"Learning to predict post-hospitalization VTE risk from EHR data","volume":"2012","author":"Kawaler","year":"2012","journal-title":"AMIA Annu Symp Proc"},{"key":"2021012411195959500_ocz079-B18","first-page":"9","article-title":"Cohort identification for clinical research: querying federated electronic healthcare records using controlled vocabularies and semantic types","author":"Keung","year":"2012","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"issue":"7","key":"2021012411195959500_ocz079-B19","doi-asserted-by":"crossref","first-page":"e0159621","DOI":"10.1371\/journal.pone.0159621","article-title":"Electronic health record based algorithm to identify patients with autism spectrum disorder","volume":"11","author":"Lingren","year":"2016","journal-title":"PLoS One"},{"key":"2021012411195959500_ocz079-B20"},{"issue":"4","key":"2021012411195959500_ocz079-B21","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1136\/jamia.2010.003707","article-title":"Symbolic rule-based classification of lung cancer stages from free-text pathology reports","volume":"17","author":"Nguyen","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195959500_ocz079-B22","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.jad.2015.12.066","article-title":"Identifying a clinical signature of suicidality among patients with mood disorders: a pilot study using a machine learning approach","volume":"193","author":"Passos","year":"2016","journal-title":"J Affect Disord"},{"key":"2021012411195959500_ocz079-B23","first-page":"3621","author":"Zhou","year":"2014"},{"issue":"e1","key":"2021012411195959500_ocz079-B24","doi-asserted-by":"crossref","first-page":"e141","DOI":"10.1093\/jamia\/ocu050","article-title":"Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials","volume":"22","author":"Miotto","year":"2015","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2021012411195959500_ocz079-B25","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1016\/j.jbi.2009.07.007","article-title":"Rule-based information extraction from patients\u2019 clinical data","volume":"42","author":"Mykowiecka","year":"2009","journal-title":"J Biomed Inform"},{"issue":"11","key":"2021012411195959500_ocz079-B26","doi-asserted-by":"crossref","first-page":"1070","DOI":"10.1086\/606164","article-title":"Use of international classification of diseases, ninth revision clinical modification codes and medication use data to identify nosocomial clostridium difficile infection","volume":"30","author":"Schmiedeskamp","year":"2009","journal-title":"Infect Control Hosp Epidemiol"},{"key":"2021012411195959500_ocz079-B27","first-page":"722","article-title":"Discovering peripheral arterial disease cases from radiology notes using natural language processing","volume":"2010","author":"Savova","year":"2010","journal-title":"In AMIA Annu Symp Proc"},{"key":"2021012411195959500_ocz079-B28","first-page":"619","article-title":"Mayo clinic smoking status classification system: extensions and improvements","author":"Sohn","year":"2009","journal-title":"AMIA Annu Symp Proc"},{"key":"2021012411195959500_ocz079-B29","first-page":"1754.","article-title":"Classifying clinical trial eligibility criteria to facilitate phased cohort identification using clinical data repositories","volume":"2017","author":"Wang","year":"2017","journal-title":"AMIA Annu Symp Proc"},{"key":"2021012411195959500_ocz079-B30","first-page":"1564","article-title":"Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases","volume":"2011","author":"Xu","year":"2011","journal-title":"AMIA Annu Symp Proc"},{"key":"2021012411195959500_ocz079-B31","doi-asserted-by":"crossref","first-page":"i144","DOI":"10.1136\/amiajnl-2011-000351","article-title":"Drug side effect extraction from clinical narratives of psychiatry and psychology patients","volume":"18 (suppl 1)","author":"Sohn","year":"2011","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2021012411195959500_ocz079-B32","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1136\/jamia.2009.001560","article-title":"Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications","volume":"17","author":"Savova","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195959500_ocz079-B33","first-page":"149","article-title":"An information extraction framework for cohort identification using electronic health records","volume":"2013","author":"Liu","year":"2013","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"key":"2021012411195959500_ocz079-B34","first-page":"1191.","article-title":"Epidea: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification","volume":"2012","author":"Cui","year":"2012","journal-title":"AMIA Annu Symp Proc"},{"issue":"e1","key":"2021012411195959500_ocz079-B35","doi-asserted-by":"crossref","first-page":"e151","DOI":"10.1136\/amiajnl-2014-002642","article-title":"Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record","volume":"22","author":"Lin","year":"2015","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195959500_ocz079-B36","first-page":"857","article-title":"A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes","author":"Wei","year":"2010","journal-title":"AMIA Annu Symp Proc"},{"key":"2021012411195959500_ocz079-B37","first-page":"2250.","article-title":"HyDeXT: a hybrid de-identification and extraction tool for health text","author":"Zhao","year":"2017","journal-title":"AMIA Annu Symp Proc"},{"issue":"3","key":"2021012411195959500_ocz079-B38","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1136\/jamia.2009.002733","article-title":"An overview of MetaMap: historical perspective and recent advances","volume":"17","author":"Aronson","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195959500_ocz079-B39","author":"US National Library of Medicine"},{"issue":"5","key":"2021012411195959500_ocz079-B40","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1006\/jbin.2001.1029","article-title":"A simple algorithm for identifying negated findings and diseases in discharge summaries","volume":"34","author":"Chapman","year":"2001","journal-title":"J Biomed Inform"},{"issue":"5","key":"2021012411195959500_ocz079-B41","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1016\/j.jbi.2009.05.002","article-title":"ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports","volume":"42","author":"Harkema","year":"2009","journal-title":"J Biomed Inform"},{"key":"2021012411195959500_ocz079-B42","author":"National Institutes of Health"},{"key":"2021012411195959500_ocz079-B43","author":"US Food and Drug Administration"},{"key":"2021012411195959500_ocz079-B44","author":"UMLS Reference Manual [Internet]","year":"2009"},{"key":"2021012411195959500_ocz079-B45","first-page":"1150","article-title":"Mining consumer health vocabulary from community-generated text","author":"Vydiswaran","year":"2014","journal-title":"AMIA Annu Symp Proc"},{"issue":"6","key":"2021012411195959500_ocz079-B46","doi-asserted-by":"crossref","first-page":"737","DOI":"10.3892\/br.2016.643","article-title":"Association between glycated hemoglobin A1c levels with age and gender in Chinese adults with no prior diagnosis of diabetes mellitus","volume":"4","author":"Ma","year":"2016","journal-title":"Biomed Rep"},{"issue":"12","key":"2021012411195959500_ocz079-B47","doi-asserted-by":"crossref","first-page":"770","DOI":"10.7326\/0003-4819-152-12-201006150-00004","article-title":"Glucose-independent, black-white differences in hemoglobin A1c levels: a cross-sectional analysis of 2 studies","volume":"152","author":"Ziemer","year":"2010","journal-title":"Ann Intern Med"},{"issue":"1","key":"2021012411195959500_ocz079-B48","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1034\/j.1399-5448.2003.00020.x","article-title":"Relationship between glycemic control, ethnicity and socioeconomic status in Hispanic and white non-Hispanic youths with type 1 diabetes mellitus","volume":"4","author":"Gallegos-Macias","year":"2003","journal-title":"Pediatr Diabetes"},{"key":"2021012411195959500_ocz079-B49","author":"Richesson","year":"2014"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/11\/1172\/36089058\/ocz079.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/11\/1172\/36089058\/ocz079.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,24]],"date-time":"2021-01-24T16:20:09Z","timestamp":1611505209000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/11\/1172\/5518584"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,14]]},"references-count":49,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,6,14]]},"published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz079","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11]]},"published":{"date-parts":[[2019,6,14]]}}}