{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T10:23:04Z","timestamp":1776939784984,"version":"3.51.4"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2023,8,7]],"date-time":"2023-08-07T00:00:00Z","timestamp":1691366400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","award":["R15LM013209"],"award-info":[{"award-number":["R15LM013209"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["UL1TR002319"],"award-info":[{"award-number":["UL1TR002319"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>LeafAI matched a mean 43% of enrolled patients with 27\u200a225 eligible across 8 clinical trials, compared to 27% matched and 14\u200a587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusions<\/jats:title>\n                  <jats:p>Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival an experienced human programmer in finding patients eligible for clinical trials.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocad149","type":"journal-article","created":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T02:41:23Z","timestamp":1691462483000},"page":"1954-1964","source":"Crossref","is-referenced-by-count":9,"title":["LeafAI: query generator for clinical cohort discovery rivaling a human programmer"],"prefix":"10.1093","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3598-8747","authenticated-orcid":false,"given":"Nicholas J","family":"Dobbins","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics & Medical Education, University of Washington , Seattle, Washington, USA"},{"name":"Department of Research IT, UW Medicine, University of Washington , Seattle, Washington, USA"}]},{"given":"Bin","family":"Han","sequence":"additional","affiliation":[{"name":"Information School, University of Washington , Seattle, Washington, USA"}]},{"given":"Weipeng","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics & Medical Education, University of Washington , Seattle, Washington, USA"}]},{"given":"Kristine F","family":"Lan","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Washington , Seattle, Washington, USA"}]},{"given":"H Nina","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Washington , Seattle, Washington, USA"}]},{"given":"Robert","family":"Harrington","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Washington , Seattle, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8011-9850","authenticated-orcid":false,"given":"\u00d6zlem","family":"Uzuner","sequence":"additional","affiliation":[{"name":"Department of Information Sciences and Technology, George Mason University , Fairfax, Virginia, USA"}]},{"given":"Meliha","family":"Yetisgen","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics & Medical Education, University of Washington , Seattle, Washington, USA"}]}],"member":"286","published-online":{"date-parts":[[2023,8,7]]},"reference":[{"issue":"1\u20132","key":"2023111709554845500_ocad149-B1","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1111\/j.1365-2702.2009.03041.x","article-title":"Clinical trials: the challenge of recruitment and retention of participants","volume":"19","author":"Gul","year":"2010","journal-title":"J Clin Nurs"},{"issue":"1","key":"2023111709554845500_ocad149-B2","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/1478-4505-13-8","article-title":"Barriers and opportunities for enhancing patient recruitment and retention in clinical research: findings from an interview study in an NHS academic health science centre","volume":"13","author":"Adams","year":"2015","journal-title":"Health Res Policy Syst"},{"key":"2023111709554845500_ocad149-B3","first-page":"1754","author":"Wang","year":"2017"},{"issue":"1","key":"2023111709554845500_ocad149-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00392-016-1025-6","article-title":"Electronic health records to facilitate clinical research","volume":"106","author":"Cowie","year":"2017","journal-title":"Clin Res Cardiol"},{"issue":"1","key":"2023111709554845500_ocad149-B5","doi-asserted-by":"crossref","first-page":"3","DOI":"10.23876\/j.krcp.2017.36.1.3","article-title":"Medical big data: promise and challenges","volume":"36","author":"Lee","year":"2017","journal-title":"Kidney Res Clin Pract"},{"issue":"1","key":"2023111709554845500_ocad149-B6","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1093\/jamia\/ocz165","article-title":"Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research","volume":"27","author":"Dobbins","year":"2020","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2023111709554845500_ocad149-B7","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1136\/jamia.2009.000893","article-title":"Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)","volume":"17","author":"Murphy","year":"2010","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2023111709554845500_ocad149-B8","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1186\/1471-2288-14-16","article-title":"Use of the i2b2 research query tool to conduct a matched case\u2013control clinical research study: advantages, disadvantages and methodological considerations","volume":"14","author":"Johnson","year":"2014","journal-title":"BMC Med Res Methodol"},{"issue":"1","key":"2023111709554845500_ocad149-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2288-9-70","article-title":"Evaluating the informatics for integrating biology and the bedside system for clinical research","volume":"9","author":"Deshmukh","year":"2009","journal-title":"BMC Med Res Methodol"},{"issue":"4","key":"2023111709554845500_ocad149-B10","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1093\/jamia\/ocy178","article-title":"Criteria2Query: a natural language interface to clinical databases for cohort definition","volume":"26","author":"Yuan","year":"2019","journal-title":"J Am Med Inform Assoc"},{"key":"2023111709554845500_ocad149-B11","first-page":"1150","author":"Soni","year":"2020"},{"issue":"7","key":"2023111709554845500_ocad149-B12","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1093\/jamia\/ocac051","article-title":"Combining human and machine intelligence for clinical trial eligibility querying","volume":"29","author":"Fang","year":"2022","journal-title":"J Am Med Inform Assoc"},{"key":"2023111709554845500_ocad149-B13","first-page":"1029","author":"Zhang","year":"2020"},{"issue":"11","key":"2023111709554845500_ocad149-B14","doi-asserted-by":"crossref","first-page":"1218","DOI":"10.1093\/jamia\/ocz109","article-title":"Clinical trial cohort selection based on multi-level rule-based natural language processing system","volume":"26","author":"Chen","year":"2019","journal-title":"J Am Med Inform Assoc"},{"key":"2023111709554845500_ocad149-B15","first-page":"534","volume-title":"MEDINFO 2015: eHealth-Enabled Health","author":"Patr\u00e3o","year":"2015"},{"key":"2023111709554845500_ocad149-B16","doi-asserted-by":"crossref","first-page":"107236","DOI":"10.1016\/j.cie.2021.107236","article-title":"EMR2vec: bridging the gap between patient data and clinical trial","volume":"156","author":"Dhayne","year":"2021","journal-title":"Comput Ind Eng"},{"issue":"7855","key":"2023111709554845500_ocad149-B17","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1038\/s41586-021-03430-5","article-title":"Evaluating eligibility criteria of oncology trials using real-world data and AI","volume":"592","author":"Liu","year":"2021","journal-title":"Nature"},{"issue":"11","key":"2023111709554845500_ocad149-B18","doi-asserted-by":"crossref","first-page":"1203","DOI":"10.1093\/jamia\/ocz099","article-title":"Cohort selection for clinical trials using hierarchical neural network","volume":"26","author":"Xiong","year":"2019","journal-title":"J Am Med Inform Assoc"},{"key":"2023111709554845500_ocad149-B19","first-page":"574","article-title":"Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers","volume":"216","author":"Hripcsak","year":"2015","journal-title":"Stud Health Technol Inform"},{"key":"2023111709554845500_ocad149-B20","first-page":"13","volume-title":"Machine Learning for Health","author":"Bae","year":"2021"},{"key":"2023111709554845500_ocad149-B21","first-page":"36","volume-title":"Machine Learning for Healthcare Conference","author":"Park","year":"2021"},{"key":"2023111709554845500_ocad149-B22","first-page":"350","author":"Wang","year":"2020"},{"issue":"12","key":"2023111709554845500_ocad149-B23","doi-asserted-by":"crossref","first-page":"e32698","DOI":"10.2196\/32698","article-title":"A BERT-based generation model to transform medical texts to SQL queries for electronic medical records: model development and validation","volume":"9","author":"Pan","year":"2021","journal-title":"JMIR Med Inform"},{"key":"2023111709554845500_ocad149-B24","first-page":"816","author":"Patel","year":"2007"},{"key":"2023111709554845500_ocad149-B25","first-page":"11","author":"Huang","year":"2013"},{"key":"2023111709554845500_ocad149-B26","first-page":"1069","author":"Baader","year":"2018"},{"issue":"1","key":"2023111709554845500_ocad149-B27","doi-asserted-by":"crossref","first-page":"160035","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"key":"2023111709554845500_ocad149-B28","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1200\/CCI.20.00079","article-title":"Extending the OMOP common data model and standardized vocabularies to support observational cancer research","volume":"5","author":"Belenkaya","year":"2021","journal-title":"JCO Clin Cancer Inform"},{"key":"2023111709554845500_ocad149-B29","first-page":"86","author":"Peng","year":"2021"},{"key":"2023111709554845500_ocad149-B30","first-page":"138","author":"Zoch","year":"2021"},{"key":"2023111709554845500_ocad149-B31","doi-asserted-by":"crossref","first-page":"103239","DOI":"10.1016\/j.jbi.2019.103239","article-title":"HemOnc: a new standard vocabulary for chemotherapy regimen representation in the OMOP common data model","volume":"96","author":"Warner","year":"2019","journal-title":"J Biomed Inform"},{"issue":"2","key":"2023111709554845500_ocad149-B32","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1007\/s40264-012-0009-3","article-title":"An evaluation of the THIN database in the OMOP common data model for active drug safety surveillance","volume":"36","author":"Zhou","year":"2013","journal-title":"Drug Saf"},{"issue":"3","key":"2023111709554845500_ocad149-B33","doi-asserted-by":"crossref","first-page":"e13249","DOI":"10.2196\/13249","article-title":"Genomic common data model for seamless interoperation of biomedical data in clinical practice: retrospective study","volume":"21","author":"Shin","year":"2019","journal-title":"J Med Internet Res"},{"key":"2023111709554845500_ocad149-B34","volume-title":"Development of Common Data Module Extension for Radiology Data (R-CDM): A Pilot Study to Predict Outcome of Liver Cirrhosis with Using Portal Phase Abdominal Computed Tomography Data","author":"Kwon","year":"2019"},{"key":"2023111709554845500_ocad149-B35","first-page":"326","author":"Bender","year":"2013"},{"key":"2023111709554845500_ocad149-B36","first-page":"46","article-title":"Analysis of eligibility criteria complexity in clinical trials","volume":"2010","author":"Ross","year":"2010","journal-title":"Summit Transl Bioinformatics"},{"key":"2023111709554845500_ocad149-B37"},{"key":"2023111709554845500_ocad149-B38","author":"Docker"},{"key":"2023111709554845500_ocad149-B39","first-page":"16","author":"Johnstone","year":"1998"},{"issue":"1","key":"2023111709554845500_ocad149-B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41597-022-01521-0","article-title":"The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria","volume":"9","author":"Dobbins","year":"2022","journal-title":"Sci Data"},{"key":"2023111709554845500_ocad149-B41","author":"Devlin","year":"2018"},{"key":"2023111709554845500_ocad149-B42","author":"Herzig","year":"2021"},{"key":"2023111709554845500_ocad149-B43","first-page":"3772","author":"Roberts","year":"2016"},{"issue":"140","key":"2023111709554845500_ocad149-B44","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2023111709554845500_ocad149-B45","first-page":"17; Washington, DC","author":"Aronson","year":"2001"},{"issue":"4","key":"2023111709554845500_ocad149-B46","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/jamia\/ocw177","article-title":"MetaMap Lite: an evaluation of a new Java implementation of MetaMap","volume":"24","author":"Demner-Fushman","year":"2017","journal-title":"J Am Med Inform Assoc"},{"key":"2023111709554845500_ocad149-B47","first-page":"345","article-title":"Normalizing adverse events using recurrent neural networks with attention","volume":"2020","author":"Lee","year":"2020","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"issue":"1","key":"2023111709554845500_ocad149-B48","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12874-022-01611-y","article-title":"ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials","volume":"22","author":"Rafee","year":"2022","journal-title":"BMC Med Res Methodol"},{"issue":"1\u2013107","key":"2023111709554845500_ocad149-B49","first-page":"6","article-title":"RDF primer","volume":"10","author":"Manola","year":"2004","journal-title":"W3C Recommendation"},{"issue":"Web Server issue","key":"2023111709554845500_ocad149-B50","doi-asserted-by":"crossref","first-page":"W170","DOI":"10.1093\/nar\/gkp440","article-title":"BioPortal: ontologies and integrated data resources at the click of a mouse","volume":"37","author":"Noy","year":"2009","journal-title":"Nucleic Acids Res"},{"issue":"Database issue","key":"2023111709554845500_ocad149-B51","doi-asserted-by":"crossref","first-page":"D940","DOI":"10.1093\/nar\/gkr972","article-title":"Disease ontology: a backbone for disease semantic integration","volume":"40","author":"Schriml","year":"2012","journal-title":"Nucleic Acids Res"},{"issue":"Database issue","key":"2023111709554845500_ocad149-B52","doi-asserted-by":"crossref","first-page":"D38","DOI":"10.1093\/nar\/gkq1172","article-title":"Database resources of the national center for biotechnology information","volume":"39","author":"Sayers","year":"2011","journal-title":"Nucleic Acids Res"},{"issue":"24","key":"2023111709554845500_ocad149-B53","doi-asserted-by":"crossref","first-page":"5703","DOI":"10.1093\/bioinformatics\/btaa1057","article-title":"The COVID-19 ontology","volume":"36","author":"Sargsyan","year":"2021","journal-title":"Bioinformatics"},{"key":"2023111709554845500_ocad149-B54","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1016\/j.jbi.2015.04.006","article-title":"Toward a complete dataset of drug\u2013drug interaction information from publicly available sources","volume":"55","author":"Ayvaz","year":"2015","journal-title":"J Biomed Inform"},{"issue":"1","key":"2023111709554845500_ocad149-B55","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41746-019-0110-4","article-title":"Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery","volume":"2","author":"Zhang","year":"2019","journal-title":"NPJ Digit Med"},{"key":"2023111709554845500_ocad149-B56","first-page":"783; Washington, DC","author":"Wang","year":"2008"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/12\/1954\/53477637\/ocad149.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/12\/1954\/53477637\/ocad149.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T13:29:29Z","timestamp":1700227769000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/30\/12\/1954\/7238441"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,7]]},"references-count":56,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2023,8,7]]},"published-print":{"date-parts":[[2023,11,17]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocad149","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,12,1]]},"published":{"date-parts":[[2023,8,7]]}}}