{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:30:38Z","timestamp":1753893038084,"version":"3.41.2"},"reference-count":57,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T00:00:00Z","timestamp":1626134400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Res. Metr. Anal."],"abstract":"<jats:p>Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.<\/jats:p>","DOI":"10.3389\/frma.2021.674205","type":"journal-article","created":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T07:22:40Z","timestamp":1626160960000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts"],"prefix":"10.3389","volume":"6","author":[{"given":"Elizabeth T.","family":"Hobbs","sequence":"first","affiliation":[]},{"given":"Stephen M.","family":"Goralski","sequence":"additional","affiliation":[]},{"given":"Ashley","family":"Mitchell","sequence":"additional","affiliation":[]},{"given":"Andrew","family":"Simpson","sequence":"additional","affiliation":[]},{"given":"Dorjan","family":"Leka","sequence":"additional","affiliation":[]},{"given":"Emmanuel","family":"Kotey","sequence":"additional","affiliation":[]},{"given":"Matt","family":"Sekira","sequence":"additional","affiliation":[]},{"given":"James B.","family":"Munro","sequence":"additional","affiliation":[]},{"given":"Suvarna","family":"Nadendla","sequence":"additional","affiliation":[]},{"given":"Rebecca","family":"Jackson","sequence":"additional","affiliation":[]},{"given":"Aitor","family":"Gonzalez-Aguirre","sequence":"additional","affiliation":[]},{"given":"Martin","family":"Krallinger","sequence":"additional","affiliation":[]},{"given":"Michelle","family":"Giglio","sequence":"additional","affiliation":[]},{"given":"Ivan","family":"Erill","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2021,7,13]]},"reference":[{"key":"B1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4614-3223-4","volume-title":"Mining Text Data","author":"Aggarwal","year":"2012"},{"key":"B2","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1162\/coli.07-034-R2","article-title":"Inter-Coder Agreement for Computational Linguistics","volume":"34","author":"Artstein","year":"2008","journal-title":"Comput. Linguistics"},{"key":"B3","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1186\/1471-2105-13-161","article-title":"Concept Annotation in the CRAFT Corpus","volume":"13","author":"Bada","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"B4","doi-asserted-by":"publisher","first-page":"D396","DOI":"10.1093\/nar\/gkn803","article-title":"The GOA Database in 2009--an Integrated Gene Ontology Annotation Resource","volume":"37","author":"Barrell","year":"2009","journal-title":"Nucleic Acids Res."},{"volume-title":"Natural Language Processing with Python","year":"2009","author":"Bird","key":"B5"},{"key":"B6","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1471-2105-13-S11-S3","article-title":"BioNLP Shared Task - The Bacteria Track","volume":"13","author":"Bossy","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-020-1044-0","article-title":"Deep Learning with Sentence Embeddings Pre-trained on Biomedical Corpora Improves the Performance of Finding Similar Sentences in Electronic Medical Records","volume":"20","author":"Chen","year":"2020","journal-title":"BMC Med. Inform. Decis. Mak"},{"key":"B8","doi-asserted-by":"publisher","first-page":"bau075","DOI":"10.1093\/database\/bau075","article-title":"Standardized Description of Scientific Evidence Using the Evidence Ontology (ECO)","volume":"2014","author":"Chibucos","year":"2014","journal-title":"Database"},{"key":"B9","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1186\/s12866-014-0294-3","article-title":"An Ontology for Microbial Phenotypes","volume":"14","author":"Chibucos","year":"2014","journal-title":"BMC Microbiol."},{"key":"B10","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1186\/2041-1480-5-28","article-title":"Micropublications: a Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications","volume":"5","author":"Clark","year":"2014","journal-title":"J. Biomed. Sem"},{"key":"B11","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A Coefficient of Agreement for Nominal Scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ. Psychol. Meas."},{"key":"B12","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1037\/h0026256","article-title":"Weighted Kappa: Nominal Scale Agreement Provision for Scaled Disagreement or Partial Credit","volume":"70","author":"Cohen","year":"1968","journal-title":"Psychol. Bull."},{"key":"B13","doi-asserted-by":"publisher","first-page":"bat064","DOI":"10.1093\/database\/bat064","article-title":"BioC: a Minimalist Approach to Interoperability for Biomedical Text Processing","volume":"2013","author":"Comeau","year":"2013","journal-title":"Database"},{"key":"B14","doi-asserted-by":"publisher","first-page":"3232","DOI":"10.1093\/bioinformatics\/btm495","article-title":"Mining Experimental Evidence of Molecular Function Claims from the Literature","volume":"23","author":"Crangle","year":"2007","journal-title":"Bioinformatics"},{"key":"B15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI Disease Corpus: A Resource for Disease Name Recognition and Concept Normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J. Biomed. Inform."},{"key":"B16","doi-asserted-by":"publisher","first-page":"R44","DOI":"10.1186\/gb-2005-6-5-r44","article-title":"The Sequence Ontology: a Tool for the Unification of Genome Annotations","volume":"6","author":"Eilbeck","year":"2005","journal-title":"Genome Biol."},{"key":"B17","doi-asserted-by":"crossref","DOI":"10.3115\/1654595.1654619","article-title":"Measuring Annotator Agreement in a Complex Hierarchical Dialogue Act Annotation Scheme","author":"Geertzen","year":"2006"},{"key":"B18","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1186\/1471-2105-11-85","article-title":"LINNAEUS: a Species Name Identification System for Biomedical Literature","volume":"11","author":"Gerner","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"B19","doi-asserted-by":"publisher","first-page":"D1186","DOI":"10.1093\/nar\/gky1036","article-title":"ECO, the Evidence & Conclusion Ontology: Community Standard for Evidence Information","volume":"47","author":"Giglio","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"B20","doi-asserted-by":"publisher","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","article-title":"The DDI Corpus: An Annotated Corpus with Pharmacological Substances and Drug-Drug Interactions","volume":"46","author":"Herrero-Zazo","year":"2013","journal-title":"J. Biomed. Inform."},{"article-title":"Introducing Hypertension FACTS: Vital Sign Ontology Annotations in the Florida Annotated Corpus for Translational Science","year":"2018","author":"Hicks","key":"B21"},{"key":"B22","doi-asserted-by":"publisher","first-page":"bas020","DOI":"10.1093\/database\/bas020","article-title":"Text Mining for the Biocuration Workflow","volume":"2012","author":"Hirschman","year":"2012","journal-title":"Database"},{"key":"B23","doi-asserted-by":"publisher","first-page":"baw147","DOI":"10.1093\/database\/baw147","article-title":"The BioC-BioGRID Corpus: Full Text Articles Annotated for Curation of Protein-Protein and Genetic Interactions","volume":"2017","author":"Islamaj Dogan","year":"2017","journal-title":"Database (Oxford)"},{"key":"B24","first-page":"171","article-title":"BioCreative VI Precision Medicine Track: Creating a Training Corpus for Mining Protein-Protein Interactions Affected by Mutations","author":"Islamaj Dogan","year":"2017"},{"key":"B25","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1471-2105-9-S3-S3","article-title":"Assessment of Disease Named Entity Recognition on a Corpus of Annotated Sentences","volume":"9","author":"Jimeno","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"B26","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1186\/s13326-017-0153-x","article-title":"Semantic Annotation in Biomedicine: the Current Landscape","volume":"8","author":"Jovanovi\u0107","year":"2017","journal-title":"J. Biomed. Semant."},{"key":"B27","doi-asserted-by":"publisher","first-page":"D156","DOI":"10.1093\/nar\/gkt1123","article-title":"CollecTF: a Database of Experimentally Validated Transcription Factor-Binding Sites in Bacteria","volume":"42","author":"Kili\u00e7","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"B28","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1186\/1471-2105-9-10","article-title":"Corpus Annotation for Mining Biomedical Events from Literature","volume":"9","author":"Kim","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"B29","doi-asserted-by":"publisher","first-page":"S4","DOI":"10.1186\/gb-2008-9-s2-s4","article-title":"Overview of the Protein-Protein Interaction Annotation Extraction Task of BioCreative II","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol."},{"key":"B30","doi-asserted-by":"publisher","first-page":"W523","DOI":"10.1093\/nar\/gky428","article-title":"ezTag: Tagging Biomedical Concepts via Interactive Learning","volume":"46","author":"Kwon","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"B31","doi-asserted-by":"publisher","first-page":"btz682","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a Pre-trained Biomedical Language Representation Model for Biomedical Text Mining","author":"Lee","year":"2019","journal-title":"Bioinformatics"},{"key":"B32","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/s13326-019-0200-x","article-title":"Similarity Corpus on Microbial Transcriptional Regulation","volume":"10","author":"Lithgow-Serrano","year":"2019","journal-title":"J. Biomed. Semant."},{"key":"B33","doi-asserted-by":"publisher","first-page":"bau086","DOI":"10.1093\/database\/bau086","article-title":"Overview of the Gene Ontology Task at BioCreative IV","volume":"2014","author":"Mao","year":"2014","journal-title":"Database"},{"key":"B34","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1038\/498255a","article-title":"The Big Challenges of Big Data","volume":"498","author":"Marx","year":"2013","journal-title":"Nature"},{"key":"B35","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1023\/a:1002402902356","article-title":"Tagger Evaluation Given Hierarchical Tag Sets","volume":"34","author":"Melamed","year":"2000","journal-title":"Comput. Humanit."},{"article-title":"Open-domain Anatomical Entity Mention Detection","year":"2012","author":"Ohta","key":"B36"},{"key":"B37","doi-asserted-by":"publisher","first-page":"e65390","DOI":"10.1371\/journal.pone.0065390","article-title":"The Species and Organisms Resources for Fast and Accurate Identification of Taxonomic Names in Text","volume":"8","author":"Pafilis","year":"2013","journal-title":"PLoS ONE"},{"key":"B38","doi-asserted-by":"publisher","first-page":"e1000443","DOI":"10.1371\/journal.pcbi.1000443","article-title":"Semantic Similarity in Biomedical Ontologies","volume":"5","author":"Pesquita","year":"2009","journal-title":"Plos Comput. Biol."},{"key":"B39","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1186\/1471-2105-8-50","article-title":"BioInfer: a Corpus for Information Extraction in the Biomedical Domain","volume":"8","author":"Pyysalo","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"B40","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1471-2105-16-S10-S2","article-title":"Overview of the Cancer Genetics and Pathway Curation Tasks of BioNLP Shared Task 2013","volume":"16","author":"Pyysalo","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"B41","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1471-2105-13-S11-S2","article-title":"Overview of the ID, EPI and REL Tasks of BioNLP Shared Task 2011","volume":"13","author":"Pyysalo","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"B42","doi-asserted-by":"publisher","first-page":"e237","DOI":"10.1093\/bioinformatics\/btl302","article-title":"EBIMed--text Crunching to Gather Facts for Proteins from Medline","volume":"23","author":"Rebholz-Schuhmann","year":"2007","journal-title":"Bioinformatics"},{"key":"B43","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1752-0509-8-S2-S2","article-title":"Use of Prior Knowledge for the Analysis of High-Throughput Transcriptomics and Metabolomics Data","volume":"8","author":"Reshetova","year":"2014","journal-title":"BMC Syst. Biol."},{"key":"B44","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1073\/pnas.98.2.381","article-title":"PubMed Central: The GenBank of the Published Literature","volume":"98","author":"Roberts","year":"2001","journal-title":"Proc. Natl. Acad. Sci."},{"key":"B45","doi-asserted-by":"publisher","first-page":"e1000391","DOI":"10.1371\/journal.pcbi.1000391","article-title":"How to Get the Most Out of Your Curation Effort","volume":"5","author":"Rzhetsky","year":"2009","journal-title":"Plos Comput. Biol."},{"key":"B46","article-title":"The E-Utilities in Depth: Parameters, Syntax, and More","volume-title":"Entrez Programming Utilities Help [internet]","author":"Sayers","year":"2014"},{"key":"B47","first-page":"2","article-title":"An Intrinsic Information Content Metric for Semantic Similarity in WordNet","author":"Seco","year":"2004"},{"key":"B48","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1186\/s13326-019-0205-5","article-title":"Phenotype Annotation with the Ontology of Microbial Phenotypes (OMP)","volume":"10","author":"Siegele","year":"2019","journal-title":"J. Biomed. Semant."},{"key":"B49","doi-asserted-by":"publisher","first-page":"baw161","DOI":"10.1093\/database\/baw161","article-title":"Pressing Needs of Biomedical Text Mining in Biocuration and beyond: Opportunities and Challenges","volume":"2016","author":"Singhal","year":"2016","journal-title":"Database"},{"article-title":"Normalisation with the BRAT Rapid Annotation Tool","year":"2012","author":"Stenetorp","key":"B50"},{"key":"B51","doi-asserted-by":"publisher","first-page":"bau074","DOI":"10.1093\/database\/bau074","article-title":"BC4GO: a Full-Text Corpus for the BioCreative IV GO Task","volume":"2014","author":"Van Auken","year":"2014","journal-title":"Database"},{"key":"B52","first-page":"10","article-title":"Sense Tagging: Does it Make Sense?","author":"V\u00e9ronis","year":""},{"key":"B53","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1186\/1471-2105-13-207","article-title":"A Corpus of Full-Text Journal Articles Is a Robust Evaluation Tool for Revealing Differences in Performance of Biomedical Natural Language Processing Tools","volume":"13","author":"Verspoor","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"B54","doi-asserted-by":"publisher","first-page":"S9","DOI":"10.1186\/1471-2105-9-S11-S9","article-title":"The BioScope Corpus: Biomedical Texts Annotated for Uncertainty, Negation and Their Scopes","volume":"9","author":"Vincze","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"B55","first-page":"307","article-title":"Biomedical Mention Disambiguation Using a Deep Learning Approach","author":"Wei","year":"2019"},{"key":"B56","doi-asserted-by":"publisher","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for Scientific Data Management and Stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci. Data"},{"key":"B57","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1038\/s41597-019-0055-0","article-title":"BioWordVec, Improving Biomedical Word Embeddings with Subword Information and MeSH","volume":"6","author":"Zhang","year":"2019","journal-title":"Sci. Data"}],"container-title":["Frontiers in Research Metrics and Analytics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frma.2021.674205\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T07:22:50Z","timestamp":1626160970000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frma.2021.674205\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,13]]},"references-count":57,"alternative-id":["10.3389\/frma.2021.674205"],"URL":"https:\/\/doi.org\/10.3389\/frma.2021.674205","relation":{},"ISSN":["2504-0537"],"issn-type":[{"type":"electronic","value":"2504-0537"}],"subject":[],"published":{"date-parts":[[2021,7,13]]},"article-number":"674205"}}