{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T15:59:15Z","timestamp":1761580755199},"reference-count":11,"publisher":"World Scientific Pub Co Pte Lt","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Bioinform. Comput. Biol."],"published-print":{"date-parts":[[2010,2]]},"abstract":"<jats:p> The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. The generation of this corpus requires that the annotations from different automatic annotation systems be harmonized. In the first phase, the annotation systems from five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All annotations were delivered in a common annotation format that included concept identifiers in the boundary assignments and that enabled comparison and alignment of the results. During the harmonization phase, the results produced from those different systems were integrated in a single harmonized corpus (\"silver standard\" corpus) by applying a voting scheme. We give an overview of the processed data and the principles of harmonization \u2014 formal boundary reconciliation and semantic matching of named entities. Finally, all submissions of the participants were evaluated against that silver standard corpus. We found that species and disease annotations are better standardized amongst the partners than the annotations of genes and proteins. The raw corpus is now available for additional named entity annotations. Parts of it will be made available later on for a public challenge. We expect that we can improve corpus building activities both in terms of the numbers of named entity classes being covered, as well as the size of the corpus in terms of annotated documents. <\/jats:p>","DOI":"10.1142\/s0219720010004562","type":"journal-article","created":{"date-parts":[[2010,2,22]],"date-time":"2010-02-22T11:18:45Z","timestamp":1266837525000},"page":"163-179","source":"Crossref","is-referenced-by-count":67,"title":["CALBC SILVER STANDARD CORPUS"],"prefix":"10.1142","volume":"08","author":[{"given":"DIETRICH","family":"REBHOLZ-SCHUHMANN","sequence":"first","affiliation":[{"name":"EMBL Outstation \u2014 Hinxton, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK"}]},{"given":"ANTONIO JOS\u00c9 JIMENO","family":"YEPES","sequence":"additional","affiliation":[{"name":"EMBL Outstation \u2014 Hinxton, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK"}]},{"given":"ERIK M.","family":"VAN MULLIGEN","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center, NL-3000 Rotterdam, The Netherlands"}]},{"given":"NING","family":"KANG","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center, NL-3000 Rotterdam, The Netherlands"}]},{"given":"JAN","family":"KORS","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center, NL-3000 Rotterdam, The Netherlands"}]},{"given":"DAVID","family":"MILWARD","sequence":"additional","affiliation":[{"name":"Linguamatics Ltd, St. John's Innovation Centre, Cowley Rd, Cambridge, CB4 0WS, UK"}]},{"given":"PETER","family":"CORBETT","sequence":"additional","affiliation":[{"name":"Linguamatics Ltd, St. John's Innovation Centre, Cowley Rd, Cambridge, CB4 0WS, UK"}]},{"given":"EKATERINA","family":"BUYKO","sequence":"additional","affiliation":[{"name":"JULIE Lab, Friedrich-Schiller-Universit\u00e4t Jena, D-07743 Jena, Germany"}]},{"given":"ELENA","family":"BEISSWANGER","sequence":"additional","affiliation":[{"name":"JULIE Lab, Friedrich-Schiller-Universit\u00e4t Jena, D-07743 Jena, Germany"}]},{"given":"UDO","family":"HAHN","sequence":"additional","affiliation":[{"name":"JULIE Lab, Friedrich-Schiller-Universit\u00e4t Jena, D-07743 Jena, Germany"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf1","volume":"9","author":"Smith L.","journal-title":"Genome Biol."},{"key":"rf2","volume":"9","author":"Morgan A. A.","journal-title":"Genome Biol."},{"key":"rf3","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm557"},{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp071"},{"key":"rf8","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti475"},{"key":"rf9","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl421"},{"key":"rf10","first-page":"117","volume":"3","author":"Frantzi K.","journal-title":"Int. J. Digital Libraries"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-6-S1-S1"},{"key":"rf12","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2008-9-s2-s1"},{"key":"rf15","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-002-0090-8"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2003.11.002"}],"container-title":["Journal of Bioinformatics and Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219720010004562","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T02:42:52Z","timestamp":1565145772000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219720010004562"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,2]]},"references-count":11,"journal-issue":{"issue":"01","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2010,2]]}},"alternative-id":["10.1142\/S0219720010004562"],"URL":"https:\/\/doi.org\/10.1142\/s0219720010004562","relation":{},"ISSN":["0219-7200","1757-6334"],"issn-type":[{"value":"0219-7200","type":"print"},{"value":"1757-6334","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,2]]}}}