{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T23:51:32Z","timestamp":1772149892077,"version":"3.50.1"},"reference-count":58,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,12,26]],"date-time":"2024-12-26T00:00:00Z","timestamp":1735171200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Swiss Innovation Agency Innosuisse","award":["101.466 IP-ICT"],"award-info":[{"award-number":["101.466 IP-ICT"]}]},{"name":"CTxAI: quality by design of clinical studies using explainable AI"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Objectives<\/jats:title>\n                    <jats:p>Clinical trials (CTs) are essential for improving patient care by evaluating new treatments\u2019 safety and efficacy. A key component in CT protocols is the study population defined by the eligibility criteria. This study aims to evaluate the effectiveness of large language models (LLMs) in encoding eligibility criterion information to support CT-protocol design.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Materials and Methods<\/jats:title>\n                    <jats:p>We extracted eligibility criterion sections, phases, conditions, and interventions from CT protocols available in the ClinicalTrials.gov registry. Eligibility sections were split into individual rules using a criterion tokenizer and embedded using LLMs. The obtained representations were clustered. The quality and relevance of the clusters for protocol design was evaluated through 3 experiments: intrinsic alignment with protocol information and human expert cluster coherence assessment, extrinsic evaluation through CT-level classification tasks, and eligibility section generation.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Sentence embeddings fine-tuned using biomedical corpora produce clusters with the highest alignment to CT-level information. Human expert evaluation confirms that clusters are well structured and coherent. Despite the high information compression, clusters retain significant CT information, up to 97% of the classification performance obtained with raw embeddings. Finally, eligibility sections automatically generated using clusters achieve 95% of the ROUGE scores obtained with a generative LLM prompted with CT-protocol details, suggesting that clusters encapsulate information useful to CT-protocol design.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>Clusters derived from sentence-level LLM embeddings effectively summarize complex eligibility criterion data while retaining relevant CT-protocol details. Clustering-based approaches provide a scalable enhancement in CT design that balances information compression with accuracy.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>Clustering eligibility criteria using LLM embeddings provides a practical and efficient method to summarize critical protocol information. We provide an interactive visualization of the pipeline here.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/jamia\/ocae311","type":"journal-article","created":{"date-parts":[[2024,12,6]],"date-time":"2024-12-06T07:25:39Z","timestamp":1733469939000},"page":"447-458","source":"Crossref","is-referenced-by-count":5,"title":["Analysis of eligibility criteria clusters based on large language models for clinical trial design"],"prefix":"10.1093","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7266-627X","authenticated-orcid":false,"given":"Alban","family":"Bornet","sequence":"first","affiliation":[{"name":"Department of Radiology and Medical Informatics, University of Geneva , 1202 Geneva,","place":["Switzerland"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3559-2144","authenticated-orcid":false,"given":"Philipp","family":"Khlebnikov","sequence":"additional","affiliation":[{"name":"Risklick AG , 3013 Bern,","place":["Switzerland"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-6610-202X","authenticated-orcid":false,"given":"Florian","family":"Meer","sequence":"additional","affiliation":[{"name":"Risklick AG , 3013 Bern,","place":["Switzerland"]}]},{"given":"Quentin","family":"Haas","sequence":"additional","affiliation":[{"name":"Risklick AG , 3013 Bern,","place":["Switzerland"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3309-6128","authenticated-orcid":false,"given":"Anthony","family":"Yazdani","sequence":"additional","affiliation":[{"name":"Department of Radiology and Medical Informatics, University of Geneva , 1202 Geneva,","place":["Switzerland"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4439-8212","authenticated-orcid":false,"given":"Boya","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Radiology and Medical Informatics, University of Geneva , 1202 Geneva,","place":["Switzerland"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9473-0172","authenticated-orcid":false,"given":"Poorya","family":"Amini","sequence":"additional","affiliation":[{"name":"Risklick AG , 3013 Bern,","place":["Switzerland"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6238-4503","authenticated-orcid":false,"given":"Douglas","family":"Teodoro","sequence":"additional","affiliation":[{"name":"Department of Radiology and Medical Informatics, University of Geneva , 1202 Geneva,","place":["Switzerland"]}]}],"member":"286","published-online":{"date-parts":[[2024,12,26]]},"reference":[{"key":"2025021811403124200_ocae311-B1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4419-1586-3","volume-title":"Fundamentals of Clinical Trials","author":"Friedman","year":"2010"},{"key":"2025021811403124200_ocae311-B2","doi-asserted-by":"crossref","first-page":"1290","DOI":"10.1001\/jama.284.10.1290","article-title":"Users\u2019 guides to the medical literature: XXV. Evidence-based medicine: principles for applying the users\u2019 guides to patient care","volume":"284","author":"Guyatt","year":"2000","journal-title":"JAMA"},{"key":"2025021811403124200_ocae311-B3","doi-asserted-by":"crossref","first-page":"1233","DOI":"10.1001\/jama.297.11.1233","article-title":"Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review","volume":"297","author":"Van Spall","year":"2007","journal-title":"JAMA"},{"key":"2025021811403124200_ocae311-B4","doi-asserted-by":"crossref","first-page":"10","DOI":"10.7326\/0003-4819-137-1-200207020-00007","article-title":"Reporting the recruitment process in clinical trials: who are these patients and how did they get there?","volume":"137","author":"Gross","year":"2002","journal-title":"Ann Intern Med"},{"key":"2025021811403124200_ocae311-B5","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/S0140-6736(04)17670-8","article-title":"External validity of randomised controlled trials: \u201cto whom do the results of this trial apply?\u201d","volume":"365","author":"Rothwell","year":"2005","journal-title":"Lancet"},{"key":"2025021811403124200_ocae311-B6","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1001\/jama.294.2.218","article-title":"Contradicted and initially stronger effects in highly cited clinical research","volume":"294","author":"Ioannidis","year":"2005","journal-title":"JAMA"},{"key":"2025021811403124200_ocae311-B7","doi-asserted-by":"crossref","first-page":"1867","DOI":"10.1007\/s10238-022-00975-1","article-title":"A review of research on eligibility criteria for clinical trials","volume":"23","author":"Su","year":"2023","journal-title":"Clin Exp Med"},{"key":"2025021811403124200_ocae311-B8","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1038\/d41586-024-00753-x","article-title":"How AI is being used to accelerate clinical trials","volume":"627","author":"Hutson","year":"2024","journal-title":"Nature"},{"key":"2025021811403124200_ocae311-B9","doi-asserted-by":"crossref","first-page":"51","DOI":"10.4103\/picr.PICR_6_20","article-title":"Recruitment and retention of participants in clinical studies: critical issues and challenges","volume":"11","author":"Desai","year":"2020","journal-title":"Perspect Clin Res"},{"key":"2025021811403124200_ocae311-B10","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1016\/j.conctc.2018.08.001","article-title":"Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: a review","volume":"11","author":"Fogel","year":"2018","journal-title":"Contemp Clin Trials Commun"},{"key":"2025021811403124200_ocae311-B11","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1038\/nrd.2016.184","article-title":"Phase II and phase III failures: 2013\u20132015","volume":"15","author":"Harrison","year":"2016","journal-title":"Nat Rev Drug Discov"},{"key":"2025021811403124200_ocae311-B12","doi-asserted-by":"crossref","first-page":"e0127242","DOI":"10.1371\/journal.pone.0127242","article-title":"Terminated trials in the ClinicalTrials. gov results database: evaluation of availability of primary outcome data and reasons for termination","volume":"10","author":"Williams","year":"2015","journal-title":"PLoS One"},{"key":"2025021811403124200_ocae311-B13","first-page":"1138","article-title":"Most patients receiving routine care for rheumatoid arthritis in 2001 did not meet inclusion criteria for most recent clinical trials or American College of Rheumatology criteria for remission","volume":"30","author":"Sokka","year":"2003","journal-title":"J Rheumatol"},{"key":"2025021811403124200_ocae311-B14","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1177\/135581969900400210","article-title":"Threats to applicability of randomised trials: exclusions and selective participation","volume":"4","author":"Britton","year":"1999","journal-title":"J Health Serv Res Policy"},{"key":"2025021811403124200_ocae311-B15","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1016\/S0002-8703(03)00189-3","article-title":"Most hospitalized older persons do not meet the enrollment criteria for clinical trials in heart failure","volume":"146","author":"Masoudi","year":"2003","journal-title":"Am Heart J"},{"key":"2025021811403124200_ocae311-B16","doi-asserted-by":"crossref","first-page":"1682","DOI":"10.1001\/archinte.162.15.1682","article-title":"Representation of the elderly, women, and minorities in heart failure clinical trials","volume":"162","author":"Heiat","year":"2002","journal-title":"Arch Intern Med"},{"key":"2025021811403124200_ocae311-B17","doi-asserted-by":"crossref","first-page":"S100","DOI":"10.1038\/d41586-019-02871-3","article-title":"An AI boost for clinical trials","volume":"573","author":"Woo","year":"2019","journal-title":"Nature"},{"key":"2025021811403124200_ocae311-B18","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1056\/NEJMsa1012065","article-title":"The ClinicalTrials. gov results database\u2014update and key issues","volume":"364","author":"Zarin","year":"2011","journal-title":"N Engl J Med"},{"key":"2025021811403124200_ocae311-B19","doi-asserted-by":"crossref","first-page":"1066","DOI":"10.1007\/s13311-023-01384-2","article-title":"Machine learning in clinical trials: a primer with applications to neurology","volume":"20","author":"Miller","year":"2023","journal-title":"Neurotherapeutics"},{"key":"2025021811403124200_ocae311-B20","doi-asserted-by":"crossref","first-page":"1062","DOI":"10.1093\/jamia\/ocx019","article-title":"EliIE: an open-source information extraction system for clinical trial eligibility criteria","volume":"24","author":"Kang","year":"2017","journal-title":"J Am Med Inform Assoc"},{"key":"2025021811403124200_ocae311-B21","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1093\/jamia\/ocy178","article-title":"Criteria2Query: a natural language interface to clinical databases for cohort definition","volume":"26","author":"Yuan","year":"2019","journal-title":"J Am Med Inform Assoc"},{"key":"2025021811403124200_ocae311-B22","doi-asserted-by":"crossref","first-page":"i116","DOI":"10.1136\/amiajnl-2011-000321","article-title":"EliXR: an approach to eligibility criteria extraction and representation","volume":"18","author":"Weng","year":"2011","journal-title":"J Am Med Inform Assoc"},{"key":"2025021811403124200_ocae311-B23","doi-asserted-by":"crossref","first-page":"e0263193","DOI":"10.1371\/journal.pone.0263193","article-title":"Prediction of clinical trial enrollment rates","volume":"17","author":"Bieganek","year":"2022","journal-title":"PLoS One"},{"key":"2025021811403124200_ocae311-B24","article-title":"A machine learning approach for recruitment prediction in clinical trial design","author":"Liu"},{"key":"2025021811403124200_ocae311-B25","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1002\/sim.8036","article-title":"Statistical modeling and prediction of clinical trial recruitment","volume":"38","author":"Lan","year":"2019","journal-title":"Stat Med"},{"key":"2025021811403124200_ocae311-B26","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1208\/s12248-022-00703-3","article-title":"Machine learning prediction of clinical trial operational efficiency","volume":"24","author":"Wu","year":"2022","journal-title":"AAPS J"},{"key":"2025021811403124200_ocae311-B27","doi-asserted-by":"crossref","first-page":"1206","DOI":"10.3390\/app8071206","article-title":"Learning eligibility in cancer clinical trials using deep neural networks","volume":"8","author":"Bustos","year":"2018","journal-title":"Appl Sci"},{"key":"2025021811403124200_ocae311-B28","first-page":"305","author":"Chuan","year":"2018"},{"key":"2025021811403124200_ocae311-B29","author":"Devlin"},{"key":"2025021811403124200_ocae311-B30","first-page":"1029","author":"Zhang","year":"2020"},{"key":"2025021811403124200_ocae311-B31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2022","journal-title":"ACM Trans Comput Healthc"},{"key":"2025021811403124200_ocae311-B32","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2025021811403124200_ocae311-B33","author":"Wang"},{"key":"2025021811403124200_ocae311-B34","doi-asserted-by":"crossref","first-page":"100689","DOI":"10.1016\/j.patter.2023.100689","article-title":"Deep learning-based risk prediction for interventional clinical trials based on protocol design: a retrospective study","volume":"4","author":"Ferdowsi","year":"2023","journal-title":"Patterns"},{"key":"2025021811403124200_ocae311-B35","first-page":"249","author":"Ferdowsi","year":"2022"},{"key":"2025021811403124200_ocae311-B36","author":"Wang"},{"key":"2025021811403124200_ocae311-B37","first-page":"9074","author":"Jin","year":"2024"},{"key":"2025021811403124200_ocae311-B38","author":"Guan"},{"key":"2025021811403124200_ocae311-B39","first-page":"2243","author":"Kim"},{"key":"2025021811403124200_ocae311-B40","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1162\/coli.2006.32.4.485","article-title":"Unsupervised multilingual sentence boundary detection","volume":"32","author":"Kiss","year":"2006","journal-title":"Comput Linguist"},{"key":"2025021811403124200_ocae311-B41","first-page":"265","article-title":"Medical subject headings (MeSH)","volume":"88","author":"Lipscomb","year":"2000","journal-title":"Bull Med Libr Assoc"},{"key":"2025021811403124200_ocae311-B42","author":"Grootendorst"},{"key":"2025021811403124200_ocae311-B43","author":"Reimers"},{"key":"2025021811403124200_ocae311-B44","doi-asserted-by":"crossref","first-page":"474","DOI":"10.26421\/JDI3.4-5","article-title":"Improved methods to aid unsupervised evidence-based fact checking for online health news","volume":"3","author":"Deka","year":"2022","journal-title":"JDI"},{"key":"2025021811403124200_ocae311-B45","first-page":"313","author":"Jaume-Santero","year":"2022"},{"key":"2025021811403124200_ocae311-B46","first-page":"2023","author":"Bornet"},{"key":"2025021811403124200_ocae311-B47","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2025021811403124200_ocae311-B48","doi-asserted-by":"crossref","first-page":"205","DOI":"10.21105\/joss.00205","article-title":"hierarchical density based clustering","volume":"2","author":"McInnes","year":"2017","journal-title":"JOSS"},{"key":"2025021811403124200_ocae311-B49","first-page":"2623","author":"Akiba"},{"key":"2025021811403124200_ocae311-B50","first-page":"2546","article-title":"Algorithms for hyper-parameter optimization","volume":"24","author":"Bergstra","year":"2011","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025021811403124200_ocae311-B51","doi-asserted-by":"crossref","first-page":"193","DOI":"10.3390\/info11040193","article-title":"Machine learning in python: main developments and technology trends in data science, machine learning, and artificial intelligence","volume":"11","author":"Raschka","year":"2020","journal-title":"Information"},{"key":"2025021811403124200_ocae311-B52","author":"Minhas"},{"key":"2025021811403124200_ocae311-B53","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance","volume":"11","author":"Xuan Vinh","year":"2010","journal-title":"J Mach Learn Res"},{"key":"2025021811403124200_ocae311-B54","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J R Stat Soc Series B Stat Methodol"},{"key":"2025021811403124200_ocae311-B55","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","article-title":"Ridge regression: biased estimation for nonorthogonal problems","volume":"12","author":"Hoerl","year":"1970","journal-title":"Technometrics"},{"key":"2025021811403124200_ocae311-B56","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J R Stat Soc Series B Stat Methodol"},{"key":"2025021811403124200_ocae311-B57","first-page":"74","author":"Lin","year":"2004"},{"key":"2025021811403124200_ocae311-B58","author":"Zhang"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/3\/447\/61281707\/ocae311.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/3\/447\/61281707\/ocae311.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T06:41:04Z","timestamp":1739860864000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/32\/3\/447\/7933305"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,26]]},"references-count":58,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,12,26]]},"published-print":{"date-parts":[[2025,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae311","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.10.08.24315075","asserted-by":"object"}]},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,3]]},"published":{"date-parts":[[2024,12,26]]}}}