{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T11:27:29Z","timestamp":1776338849138,"version":"3.51.2"},"reference-count":59,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,1,17]],"date-time":"2025-01-17T00:00:00Z","timestamp":1737072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","award":["R01LM014344"],"award-info":[{"award-number":["R01LM014344"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","award":["R01LM014573"],"award-info":[{"award-number":["R01LM014573"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","award":["R01LM009886"],"award-info":[{"award-number":["R01LM009886"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","award":["T15LM007079"],"award-info":[{"award-number":["T15LM007079"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["R01HG012655"],"award-info":[{"award-number":["R01HG012655"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","award":["UL1TR001873"],"award-info":[{"award-number":["UL1TR001873"]}],"id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","award":["UL1TR002384"],"award-info":[{"award-number":["UL1TR002384"]}],"id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Extracting PICO elements\u2014Participants, Intervention, Comparison, and Outcomes\u2014from clinical trial literature is essential for clinical evidence retrieval, appraisal, and synthesis. Existing approaches do not distinguish the attributes of PICO entities. This study aims to develop a named entity recognition (NER) model to extract PICO entities with fine granularities.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>Using a corpus of 2511 abstracts with PICO mentions from 4 public datasets, we developed a semi-supervised method to facilitate the training of a NER model, FinePICO, by combining limited annotated data of PICO entities and abundant unlabeled data. For evaluation, we divided the entire dataset into 2 subsets: a smaller group with annotations and a larger group without annotations. We then established the theoretical lower and upper performance bounds based on the performance of supervised learning models trained solely on the small, annotated subset and on the entire set with complete annotations, respectively. Finally, we evaluated FinePICO on both the smaller annotated subset and the larger, initially unannotated subset. We measured the performance of FinePICO using precision, recall, and F1.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Our method achieved precision\/recall\/F1 of 0.567\/0.636\/0.60, respectively, using a small set of annotated samples, outperforming the baseline model (F1: 0.437) by more than 16%. The model demonstrates generalizability to a different PICO framework and to another corpus, which consistently outperforms the benchmark in diverse experimental settings (P-value &amp;lt; .001).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>We developed FinePICO to recognize fine-grained PICO entities from text and validated its performance across diverse experimental settings, highlighting the feasibility of using semi-supervised learning (SSL) techniques to enhance PICO entities extraction. Future work can focus on optimizing SSL algorithms to improve efficiency and reduce computational costs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>This study contributes a generalizable and effective semi-supervised approach leveraging large unlabeled data together with small, annotated data for fine-grained PICO extraction.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocae326","type":"journal-article","created":{"date-parts":[[2025,1,17]],"date-time":"2025-01-17T19:24:27Z","timestamp":1737141867000},"page":"555-565","source":"Crossref","is-referenced-by-count":6,"title":["Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition"],"prefix":"10.1093","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2926-1063","authenticated-orcid":false,"given":"Fangyi","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics, Columbia University , New York, NY 10032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-0077-3615","authenticated-orcid":false,"given":"Gongbo","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Columbia University , New York, NY 10032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2681-1931","authenticated-orcid":false,"given":"Yilu","family":"Fang","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Columbia University , New York, NY 10032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9309-8331","authenticated-orcid":false,"given":"Yifan","family":"Peng","sequence":"additional","affiliation":[{"name":"Department of Population Health Sciences, Weill Cornell Medicine , New York, NY 10065,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9624-0214","authenticated-orcid":false,"given":"Chunhua","family":"Weng","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Columbia University , New York, NY 10032,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,1,17]]},"reference":[{"key":"2025021811394890500_ocae326-B1","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1016\/j.jacr.2006.12.007","article-title":"Evidence-based medicine","volume":"4","author":"Collins","year":"2007","journal-title":"J Am Coll Radiol"},{"key":"2025021811394890500_ocae326-B2","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1136\/svn-2016-000032","article-title":"Perspective and future of evidence-based medicine","volume":"1","author":"You","year":"2016","journal-title":"Stroke Vasc Neurol"},{"key":"2025021811394890500_ocae326-B3","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1136\/adc.2005.071761","article-title":"Principles of evidence based medicine","volume":"90","author":"Akobeng","year":"2005","journal-title":"Arch Dis Child"},{"key":"2025021811394890500_ocae326-B4","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1038\/s41591-023-02366-9","article-title":"AI-generated text may have a role in evidence-based medicine","volume":"29","author":"Peng","year":"2023","journal-title":"Nat Med"},{"key":"2025021811394890500_ocae326-B5","doi-asserted-by":"crossref","first-page":"104640","DOI":"10.1016\/j.jbi.2024.104640","article-title":"Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness","volume":"153","author":"Zhang","year":"2024","journal-title":"J Biomed Inform"},{"key":"2025021811394890500_ocae326-B6","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1001\/jama.2014.8167","article-title":"Meta-analysis as evidence: building a better pyramid","volume":"312","author":"Berlin","year":"2014","journal-title":"Jama"},{"key":"2025021811394890500_ocae326-B7","doi-asserted-by":"crossref","first-page":"376","DOI":"10.7326\/0003-4819-126-5-199703010-00006","article-title":"Systematic reviews: synthesis of best evidence for clinical decisions","volume":"126","author":"Cook","year":"1997","journal-title":"Ann Intern Med"},{"key":"2025021811394890500_ocae326-B8","doi-asserted-by":"crossref","first-page":"e1000326","DOI":"10.1371\/journal.pmed.1000326","article-title":"Seventy-five trials and eleven systematic reviews a day: how will we ever keep up?","volume":"7","author":"Bastian","year":"2010","journal-title":"PLoS Med"},{"key":"2025021811394890500_ocae326-B9","doi-asserted-by":"crossref","first-page":"e012545","DOI":"10.1136\/bmjopen-2016-012545","article-title":"Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry","volume":"7","author":"Borah","year":"2017","journal-title":"BMJ Open"},{"key":"2025021811394890500_ocae326-B10","first-page":"9","article-title":"The identification of clinically important elements within medical journal abstracts: Patient\u2014Population\u2014Problem, Exposure\u2014Intervention, Comparison, Outcome, Duration and Results (PECODR)","volume":"15","author":"Dawes","year":"2007","journal-title":"Inform Prim Care"},{"key":"2025021811394890500_ocae326-B11","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1162\/coli.2007.33.1.63","article-title":"Answering clinical questions with knowledge-based and statistical techniques","volume":"33","author":"Demner-Fushman","year":"2007","journal-title":"Computational Linguistics"},{"key":"2025021811394890500_ocae326-B12","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1186\/s12911-018-0699-2","article-title":"Combination of conditional random field with a rule based method in the extraction of PICO elements","volume":"18","author":"Chabou","year":"2018","journal-title":"BMC Med Inform Decis Mak"},{"key":"2025021811394890500_ocae326-B13","author":"Jin"},{"key":"2025021811394890500_ocae326-B14","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1016\/j.neunet.2005.06.042","article-title":"Framewise phoneme classification with bidirectional LSTM and other neural network architectures","volume":"18","author":"Graves","year":"2005","journal-title":"Neural Netw"},{"key":"2025021811394890500_ocae326-B15","author":"Ma"},{"key":"2025021811394890500_ocae326-B16","doi-asserted-by":"crossref","first-page":"3856","DOI":"10.1093\/bioinformatics\/btaa256","article-title":"Advancing PICO element detection in biomedical text via deep neural networks","volume":"36","author":"Jin","year":"2020","journal-title":"Bioinformatics"},{"key":"2025021811394890500_ocae326-B17","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1186\/s12911-019-0992-8","article-title":"Improving reference prioritisation with PICO recognition","volume":"19","author":"Brockmeier","year":"2019","journal-title":"BMC Med Inform Decis Mak"},{"key":"2025021811394890500_ocae326-B18","author":"Devlin","year":"2018"},{"key":"2025021811394890500_ocae326-B19","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2025021811394890500_ocae326-B20","author":"Liu","year":"2019"},{"key":"2025021811394890500_ocae326-B21","author":"Beltagy","year":"2019"},{"key":"2025021811394890500_ocae326-B22","author":"Nye"},{"key":"2025021811394890500_ocae326-B23","doi-asserted-by":"crossref","first-page":"btad542","DOI":"10.1093\/bioinformatics\/btad542","article-title":"Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach","volume":"39","author":"Hu","year":"2023","journal-title":"Bioinformatics"},{"key":"2025021811394890500_ocae326-B24","author":"Lee"},{"key":"2025021811394890500_ocae326-B25","author":"Abaho"},{"key":"2025021811394890500_ocae326-B26","doi-asserted-by":"crossref","first-page":"ooac107","DOI":"10.1093\/jamiaopen\/ooac107","article-title":"Not so weak PICO: leveraging weak supervision for participants, interventions, and outcomes recognition for systematic review automation","volume":"6","author":"Dhrangadhariya","year":"2023","journal-title":"JAMIA Open"},{"key":"2025021811394890500_ocae326-B27","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/s13326-022-00271-7","article-title":"An annotated corpus of clinical trial publications supporting schema-based relational information extraction","volume":"13","author":"Sanchez-Graillet","year":"2022","journal-title":"J Biomed Semantics"},{"key":"2025021811394890500_ocae326-B28","doi-asserted-by":"crossref","first-page":"103","DOI":"10.4097\/kjae.2018.71.2.103","article-title":"Introduction to systematic review and meta-analysis","volume":"71","author":"Ahn","year":"2018","journal-title":"Korean J Anesthesiol"},{"key":"2025021811394890500_ocae326-B29","author":"Mutinda"},{"key":"2025021811394890500_ocae326-B30","doi-asserted-by":"crossref","first-page":"1498","DOI":"10.1016\/j.eswa.2013.08.047","article-title":"Automatic text classification to support systematic reviews in medicine","volume":"41","author":"Adeva","year":"2014","journal-title":"Expert Systems with Applications"},{"key":"2025021811394890500_ocae326-B31","doi-asserted-by":"crossref","first-page":"8934","DOI":"10.1109\/TKDE.2022.3220219","article-title":"A survey on deep semi-supervised learning","volume":"35","author":"Yang","year":"2023","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2025021811394890500_ocae326-B32","author":"Yang"},{"key":"2025021811394890500_ocae326-B33","article-title":"Decoupled deep neural network for semi-supervised semantic segmentation","volume":"1495-1503","author":"Hong","year":"2015","journal-title":"Adv Neural Inform Process Syst"},{"key":"2025021811394890500_ocae326-B34","first-page":"560","author":"Roli","year":"2006"},{"key":"2025021811394890500_ocae326-B35","first-page":"1","author":"Bickel","year":"2006"},{"key":"2025021811394890500_ocae326-B36","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.ipm.2008.11.002","article-title":"Semi-supervised document retrieval","volume":"45","author":"Li","year":"2009","journal-title":"Inform Process Manage"},{"key":"2025021811394890500_ocae326-B37","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/j.csl.2010.05.002","article-title":"Semi-supervised ranking for document retrieval","volume":"25","author":"Duh","year":"2011","journal-title":"Comput Speech Lang"},{"key":"2025021811394890500_ocae326-B38","first-page":"228","author":"Erkan","year":"2007"},{"key":"2025021811394890500_ocae326-B39","first-page":"1","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2021","journal-title":"ACM Trans Comput Healthcare (HEALTH)"},{"key":"2025021811394890500_ocae326-B40","doi-asserted-by":"crossref","first-page":"13875","DOI":"10.1038\/s41598-023-40977-x","article-title":"Using pseudo-labeling to improve performance of deep neural networks for animal identification","volume":"13","author":"Ferreira","year":"2023","journal-title":"Sci Rep"},{"key":"2025021811394890500_ocae326-B41","first-page":"6256","article-title":"Unsupervised data augmentation for consistency training","volume":"33","author":"Xie","year":"2020","journal-title":"Adv Neural Inform Process Syst"},{"key":"2025021811394890500_ocae326-B42","author":"Zhang"},{"key":"2025021811394890500_ocae326-B43","doi-asserted-by":"crossref","first-page":"1812","DOI":"10.1093\/jamia\/ocad259","article-title":"Improving large language models for clinical named entity recognition via prompt engineering","volume":"31","author":"Hu","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2025021811394890500_ocae326-B44","doi-asserted-by":"crossref","first-page":"1163","DOI":"10.1093\/jamia\/ocae065","article-title":"A span-based model for extracting overlapping PICO entities from randomized controlled trial publications","volume":"31","author":"Zhang","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2025021811394890500_ocae326-B45","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1186\/s12911-022-01897-4","article-title":"Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer","volume":"22","author":"Mutinda","year":"2022","journal-title":"BMC Med Inform Decis Mak"},{"key":"2025021811394890500_ocae326-B46","volume-title":"Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit","author":"Bird","year":"2009"},{"key":"2025021811394890500_ocae326-B47","author":"Sang","year":"2000"},{"key":"2025021811394890500_ocae326-B48","first-page":"145","volume-title":"Document Information Extraction via Global Tagging","author":"He","year":"2023"},{"key":"2025021811394890500_ocae326-B49","doi-asserted-by":"crossref","first-page":"2633","DOI":"10.1038\/s41591-023-02552-9","article-title":"Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial","volume":"29","author":"Wang","year":"2023","journal-title":"Nat Med"},{"key":"2025021811394890500_ocae326-B50","doi-asserted-by":"crossref","first-page":"1064","DOI":"10.1016\/S0140-6736(00)02039-0","article-title":"Subgroup analysis and other (mis) uses of baseline data in clinical trials","volume":"355","author":"Assmann","year":"2000","journal-title":"Lancet"},{"key":"2025021811394890500_ocae326-B51","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1097\/01.blo.0000218736.23506.fe","article-title":"Misuse of baseline comparison tests and subgroup analyses in surgical trials","volume":"447","author":"Bhandari","year":"2006","journal-title":"Clin Orthop Relat Res"},{"key":"2025021811394890500_ocae326-B52","author":"Nakayama","year":"2018"},{"key":"2025021811394890500_ocae326-B53","doi-asserted-by":"crossref","first-page":"84","DOI":"10.3390\/data6080084","article-title":"The automatic detection of dataset names in scientific articles","volume":"6","author":"Heddes","year":"2021","journal-title":"Data"},{"key":"2025021811394890500_ocae326-B54","first-page":"251","article-title":"A probabilistic model for identifying protein names and their name boundaries","volume":"2","author":"Seki","year":"2003","journal-title":"Proc IEEE Comput Soc Bioinform Conf"},{"key":"2025021811394890500_ocae326-B55","author":"Banitalebi-Dehkordi"},{"key":"2025021811394890500_ocae326-B56","author":"Chen"},{"key":"2025021811394890500_ocae326-B57","first-page":"3239","article-title":"Realistic evaluation of deep semi-supervised learning algorithms","author":"Oliver","year":"2018","journal-title":"Adv in Neural Inform Process Syst"},{"key":"2025021811394890500_ocae326-B58","article-title":"Unlabeled data: now it helps, now it doesn\u2019t","volume":"1513-1520.","author":"Singh","year":"2008","journal-title":"Adv Neural Inform Process Syst"},{"key":"2025021811394890500_ocae326-B59","doi-asserted-by":"crossref","first-page":"542","DOI":"10.1007\/s41666-023-00141-6","article-title":"Towards more generalizable and accurate sentence classification in medical abstracts with less data","volume":"7","author":"Hu","year":"2023","journal-title":"J Healthc Inform Res"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/advance-article-pdf\/doi\/10.1093\/jamia\/ocae326\/61489420\/ocae326.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/advance-article-pdf\/doi\/10.1093\/jamia\/ocae326\/61489420\/ocae326.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T11:40:13Z","timestamp":1739878813000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/32\/3\/555\/7959781"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,17]]},"references-count":59,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,1,17]]},"published-print":{"date-parts":[[2025,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae326","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,3]]},"published":{"date-parts":[[2025,1,17]]}}}