{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,2]],"date-time":"2025-05-02T16:47:56Z","timestamp":1746204476487,"version":"3.37.3"},"reference-count":24,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T00:00:00Z","timestamp":1653264000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T00:00:00Z","timestamp":1653264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["376059226 \/ SPP-1999"],"award-info":[{"award-number":["376059226 \/ SPP-1999"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005721","name":"Universit\u00e4t Bielefeld","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005721","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Biomed Semant"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The evidence-based medicine paradigm requires the ability to aggregate and compare outcomes of interventions across different trials. This can be facilitated and partially automatized by information extraction systems. In order to support the development of systems that can extract information from published clinical trials at a fine-grained and comprehensive level to populate a knowledge base, we present a richly annotated corpus at two levels. At the first level, entities that describe components of the PICO elements (e.g., population\u2019s age and pre-conditions, dosage of a treatment, etc.) are annotated. The second level comprises schema-level (i.e., slot-filling templates) annotations corresponding to complex PICO elements and other concepts related to a clinical trial (e.g. the relation between an intervention and an arm, the relation between an outcome and an intervention, etc.).<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>The final corpus includes 211 annotated clinical trial abstracts with substantial agreement between annotators at the entity and scheme level. The mean Kappa value for the glaucoma and T2DM corpora was 0.74 and 0.68, respectively, for single entities. The micro-averaged <jats:italic>F<\/jats:italic><jats:sub>1<\/jats:sub> score to measure inter-annotator agreement for complex entities (i.e. slot-filling templates) was 0.81.The BERT-base baseline method for entity recognition achieved average micro- <jats:italic>F<\/jats:italic><jats:sub>1<\/jats:sub> scores of 0.76 for glaucoma and 0.77 for diabetes with exact matching.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>In this work, we have created a corpus that goes beyond the existing clinical trial corpora, since it is annotated in a schematic way that represents the classes and properties defined in an ontology. Although the corpus is small, it has fine-grained annotations and could be used to fine-tune pre-trained machine learning models and transformers to the specific task of extracting information about clinical trial abstracts.For future work, we will use the corpus for training information extraction systems that extract single entities, and predict template slot-fillers (i.e., class data\/object properties) to populate a knowledge base that relies on the C-TrO ontology for the description of clinical trials. The resulting corpus and the code to measure inter-annotation agreement and the baseline method are publicly available at https:\/\/zenodo.org\/record\/6365890.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s13326-022-00271-7","type":"journal-article","created":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T12:02:52Z","timestamp":1653307372000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["An annotated corpus of clinical trial publications supporting schema-based relational information extraction"],"prefix":"10.1186","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3483-265X","authenticated-orcid":false,"given":"Olivia","family":"Sanchez-Graillet","sequence":"first","affiliation":[]},{"given":"Christian","family":"Witte","sequence":"additional","affiliation":[]},{"given":"Frank","family":"Grimm","sequence":"additional","affiliation":[]},{"given":"Philipp","family":"Cimiano","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,5,23]]},"reference":[{"key":"271_CR1","volume-title":"Proc. of the 5th Joint Ontology Workshops (JOWO): Ontologies and Data in the Life Sciences","author":"O Sanchez-Graillet","year":"2019","unstructured":"Sanchez-Graillet O, Cimiano P, Witte C, Ell B. C-TrO: An Ontology for Summarization and Aggregation of the Level of Evidence in Clinical Trials. In: Proc. of the 5th Joint Ontology Workshops (JOWO): Ontologies and Data in the Life Sciences. Graz: CEUR-WS.org: 2019. http:\/\/ceur-ws.org\/Vol-2518\/paper-ODLS7.pdf."},{"key":"271_CR2","unstructured":"CoNLL. The SIGNLL Conference on Computational Natural Language Learning. https:\/\/www.conll.org\/. Accessed 9 Apr 2021."},{"key":"271_CR3","unstructured":"CoNLL-U Format. Universal Dependencies. https:\/\/universaldependencies.org\/docs\/format.html. Accessed 9 Apr 2021."},{"key":"271_CR4","unstructured":"Resource Description Framework (RDF). W3C. https:\/\/www.w3.org\/RDF\/. Accessed 9 Apr 2021."},{"issue":"1","key":"271_CR5","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1162\/coli.2007.33.1.63","volume":"33","author":"D Demner-Fushman","year":"2007","unstructured":"Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007; 33(1):63\u2013103.","journal-title":"Comput Linguist"},{"issue":"1","key":"271_CR6","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1186\/1472-6947-10-29","volume":"10","author":"F Boudin","year":"2010","unstructured":"Boudin F, Nie J-Y, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC Med Inf Dec Making. 2010; 10(1):29.","journal-title":"BMC Med Inf Dec Making"},{"key":"271_CR7","unstructured":"Xu R, Garten Y, Supekar KS, Das AK, Altman RB, et al.Extracting subject demographic information from abstracts of randomized clinical trial reports. In: Medinfo 2007: Proc. of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems. IOS Press: 2007. p. 550."},{"key":"271_CR8","volume-title":"Proc. of the AMIA Annual Symposium, vol. 2010","author":"J Zhao","year":"2010","unstructured":"Zhao J, Kan M-Y, Procter PM, Zubaidah S, Yip WK, Li GM. Improving search for evidence-based practice using information extraction. In: Proc. of the AMIA Annual Symposium, vol. 2010. Washington: American Medical Informatics Association: 2010. p. 937."},{"key":"271_CR9","doi-asserted-by":"publisher","unstructured":"Boudin F, Shi L, Nie J-Y. Improving medical information retrieval with PICO element detection. In: European Conference on Information Retrieval. Springer: 2010. p. 50\u201361. https:\/\/doi.org\/10.1007\/978-3-642-12275-0_8.","DOI":"10.1007\/978-3-642-12275-0_8"},{"key":"271_CR10","doi-asserted-by":"publisher","unstructured":"Summerscales RL, Argamon S, Bai S, Hupert J, Schwartz A. Automatic summarization of results from clinical trials. In: Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference On: 2011. p. 372\u20137. https:\/\/doi.org\/10.1109\/BIBM.2011.72.","DOI":"10.1109\/BIBM.2011.72"},{"key":"271_CR11","doi-asserted-by":"publisher","unstructured":"Trenta A, Hunter A, Riedel S. Extraction of evidence tables from abstracts of randomized clinical trials using a maximum entropy classifier and global constraints.arXiv; 2015. http:\/\/arxiv.org\/abs\/1509.05209, https:\/\/doi.org\/10.48550\/arXiv.1509.05209.","DOI":"10.48550\/arXiv.1509.05209"},{"key":"271_CR12","volume-title":"Proc. of the 11th Intern. Conf. on Language Resources and Evaluation (LREC 2018)","author":"M Zlabinger","year":"2018","unstructured":"Zlabinger M, Andersson L, Hanbury A, Andersson M, et al.Medical entity corpus with PICO elements and sentiment analysis. In: Proc. of the 11th Intern. Conf. on Language Resources and Evaluation (LREC 2018). Miyazaki: European Language Resources Association (ELRA): 2018."},{"key":"271_CR13","doi-asserted-by":"publisher","unstructured":"Nye B, Li JJ, Patel R, et al.A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In: Proc. of ACL 2018, Meeting, vol. 2018: 2018. p. 197\u2013207. https:\/\/doi.org\/10.18653\/v1\/P18-1019.","DOI":"10.18653\/v1\/P18-1019"},{"key":"271_CR14","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1186\/s13643-019-0975-y","volume":"8","author":"AM O\u2019Connor","year":"2019","unstructured":"O\u2019Connor AM, Tsafnat G, Gilbert SB, Thayer KA, Shemilt I, Thomas J, Glasziou P, Wolfe MS. Still moving toward automation of the systematic review process: a summary of discussions at the third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR). Syst Rev. 2019; 8:57. https:\/\/doi.org\/10.1186\/s13643-019-0975-y.","journal-title":"Syst Rev"},{"issue":"1","key":"271_CR15","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1371\/journal.pmed.0050020","volume":"5","author":"S Hopewell","year":"2008","unstructured":"Hopewell S, Clarke M, Moher D, et al.CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLoS Med. 2008; 5(1):20.","journal-title":"PLoS Med"},{"key":"271_CR16","unstructured":"PICO Linguist. MEDLINE-PubMed Search. https:\/\/babelmesh.nlm.nih.gov\/pico.php. Accessed 9 Apr 2021."},{"key":"271_CR17","doi-asserted-by":"publisher","unstructured":"Hartung M, ter Horst H, Grimm F, et al.SANTO: a web-based annotation tool for ontology-driven slot filling. In: Proc. of ACL 2018, System Demonstrations: 2018. p. 68\u201373. https:\/\/doi.org\/10.18653\/v1\/P18-4012.","DOI":"10.18653\/v1\/P18-4012"},{"key":"271_CR18","unstructured":"Hovy E. Annotation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 2010. https:\/\/aclanthology.org\/P10-5004."},{"issue":"1","key":"271_CR19","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","volume":"20","author":"J Cohen","year":"1960","unstructured":"Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960; 20(1):37\u201346.","journal-title":"Educ Psychol Meas"},{"key":"271_CR20","unstructured":"Carletta J. Assessing agreement on classification tasks: the kappa statistic. Comput Linguist. 1996;22(2). https:\/\/aclanthology.org\/J96-2004."},{"key":"271_CR21","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"JR Landis","year":"1977","unstructured":"Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33:159\u201374.","journal-title":"Biometrics"},{"key":"271_CR22","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423","volume-title":"Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1","author":"J Devlin","year":"2019","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Minneapolis, Minnesota: Association for Computational Linguistics: 2019. p. 4171\u20134186. https:\/\/doi.org\/10.18653\/v1\/N19-1423."},{"key":"271_CR23","doi-asserted-by":"publisher","unstructured":"Kingma DP, Ba J. Adam: A method for stochastic optimization In: Bengio Y, LeCun Y, editors. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings: 2015. https:\/\/doi.org\/10.48550\/arXiv.1412.6980.","DOI":"10.48550\/arXiv.1412.6980"},{"key":"271_CR24","unstructured":"Klie J-C, Bugert M, Boullosa B, de Castilho RE, Gurevych I. The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation. In: Proc. of the 27th Int. Conf. on Computational Linguistics: System Demonstrations: 2018. p. 5\u20139. http:\/\/tubiblio.ulb.tu-darmstadt.de\/106270\/."}],"container-title":["Journal of Biomedical Semantics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13326-022-00271-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13326-022-00271-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13326-022-00271-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T12:06:18Z","timestamp":1653307578000},"score":1,"resource":{"primary":{"URL":"https:\/\/jbiomedsem.biomedcentral.com\/articles\/10.1186\/s13326-022-00271-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,23]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["271"],"URL":"https:\/\/doi.org\/10.1186\/s13326-022-00271-7","relation":{},"ISSN":["2041-1480"],"issn-type":[{"type":"electronic","value":"2041-1480"}],"subject":[],"published":{"date-parts":[[2022,5,23]]},"assertion":[{"value":"24 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 May 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 May 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"14"}}