{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,28]],"date-time":"2026-06-28T01:10:57Z","timestamp":1782609057910,"version":"3.54.5"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2016,10,4]],"date-time":"2016-10-04T00:00:00Z","timestamp":1475539200000},"content-version":"vor","delay-in-days":518,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Objective To create a multilingual gold-standard corpus for biomedical concept recognition.<\/jats:p>\n               <jats:p>Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations.<\/jats:p>\n               <jats:p>Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language.<\/jats:p>\n               <jats:p>Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques.<\/jats:p>\n               <jats:p>Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated.<\/jats:p>","DOI":"10.1093\/jamia\/ocv037","type":"journal-article","created":{"date-parts":[[2015,5,7]],"date-time":"2015-05-07T00:31:51Z","timestamp":1430958711000},"page":"948-956","source":"Crossref","is-referenced-by-count":44,"title":["A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC"],"prefix":"10.1093","volume":"22","author":[{"given":"Jan A","family":"Kors","sequence":"first","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Simon","family":"Clematide","sequence":"additional","affiliation":[{"name":"Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Saber A","family":"Akhondi","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Erik M","family":"van Mulligen","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dietrich","family":"Rebholz-Schuhmann","sequence":"additional","affiliation":[{"name":"Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2015,5,5]]},"reference":[{"key":"2020110613011387700_ocv037-B1","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1136\/amiajnl-2014-002666","article-title":"NIH's Big Data to Knowledge initiative and the advancement of biomedical informatics","volume":"21","author":"Ohno-Machado","year":"2014","journal-title":"J Am Med Inform Assoc."},{"key":"2020110613011387700_ocv037-B2","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1016\/j.jbi.2004.08.004","article-title":"Term identification in the biomedical literature","volume":"37","author":"Krauthammer","year":"2004","journal-title":"J Biomed Inform."},{"key":"2020110613011387700_ocv037-B3","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1142\/S0219720010004562","article-title":"CALBC silver standard corpus","volume":"8","author":"Rebholz-Schuhmann","year":"2010","journal-title":"J Bioinform Comput Biol."},{"key":"2020110613011387700_ocv037-B4","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1186\/2041-1480-2-S5-S11","article-title":"Assessment of NER solutions against the first and second CALBC Silver Standard Corpus","volume":"2","author":"Rebholz-Schuhmann","year":"2011","journal-title":"J Biomed Semantics."},{"key":"2020110613011387700_ocv037-B5","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The Unified Medical Language System (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2020110613011387700_ocv037-B6","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1007\/978-3-642-40802-1_32","article-title":"Entity recognition in parallel multi-lingual biomedical corpora: the CLEF-ER laboratory overview","volume-title":"Information Access Evaluation. Multilinguality, Multimodality, and Visualization","author":"Rebholz-Schuhmann","year":"2013"},{"key":"2020110613011387700_ocv037-B7"},{"key":"2020110613011387700_ocv037-B8","first-page":"82","article-title":"Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark","author":"Leaman","year":"2009","journal-title":"Proceedings of the 3rd International Symposium on Languages in Biology and Medicine (LBM); Jeju Island, South Korea"},{"key":"2020110613011387700_ocv037-B9","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1016\/j.jbi.2012.04.008","article-title":"Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports","volume":"45","author":"Gurulingappa","year":"2012","journal-title":"J Biomed Inform."},{"key":"2020110613011387700_ocv037-B10","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1186\/1471-2105-13-161","article-title":"Concept annotation in the CRAFT corpus","volume":"13","author":"Bada","year":"2012","journal-title":"BMC Bioinformatics."},{"key":"2020110613011387700_ocv037-B11","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1136\/amiajnl-2013-002544","article-title":"Evaluating the state of the art in disorder recognition and normalization of the clinical narrative","volume":"22","author":"Pradhan","year":"2015","journal-title":"J Am Med Inform Assoc."},{"key":"2020110613011387700_ocv037-B12","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1186\/1471-2105-6-S1-S11","article-title":"Overview of BioCreAtIvE task 1B: normalized gene lists","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics."},{"key":"2020110613011387700_ocv037-B13","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/gb-2008-9-s2-s3","article-title":"Overview of BioCreative II gene normalization","volume":"9","author":"Morgan","year":"2008","journal-title":"Genome Biol."},{"key":"2020110613011387700_ocv037-B14","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-12-S8-S2","article-title":"The gene normalization task in BioCreative III","volume":"12","author":"Lu","year":"2011","journal-title":"BMC Bioinformatics."},{"key":"2020110613011387700_ocv037-B15","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/S1386-5056(02)00058-8","article-title":"Semantic annotation for concept-based cross-language medical information retrieval","volume":"67","author":"Volk","year":"2002","journal-title":"Int J Med Inform."},{"key":"2020110613011387700_ocv037-B16"},{"key":"2020110613011387700_ocv037-B17"},{"key":"2020110613011387700_ocv037-B18","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1016\/j.jbi.2003.11.002","article-title":"Exploring semantic groups through visual approaches","volume":"36","author":"Bodenreider","year":"2003","journal-title":"J Biomed Inform."},{"key":"2020110613011387700_ocv037-B19","author":"Stenetorp"},{"key":"2020110613011387700_ocv037-B20","first-page":"131","article-title":"Peregrine: lightweight gene name normalization by dictionary lookup","author":"Schuemie","year":"2007","journal-title":"Proceedings of the BioCreAtIvE II Workshop; Madrid, Spain"},{"key":"2020110613011387700_ocv037-B21","first-page":"1","article-title":"An overview of JCoRe, the JULIE lab UIMA component repository","author":"Hahn","year":"2008","journal-title":"Proceedings of the Language Resources and Evaluation Conference (LREC); Marrakech, Morocco"},{"key":"2020110613011387700_ocv037-B22","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1093\/bioinformatics\/btm557","article-title":"Text processing through Web services: calling Whatizit","volume":"24","author":"Rebholz-Schuhmann","year":"2008","journal-title":"Bioinformatics."},{"key":"2020110613011387700_ocv037-B23"},{"key":"2020110613011387700_ocv037-B24"},{"key":"2020110613011387700_ocv037-B25","author":"Rebholz-Schuhmann"},{"key":"2020110613011387700_ocv037-B26","first-page":"3894","article-title":"Centroids: gold standards with distributional variation","author":"Lewin","year":"2012","journal-title":"Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012); European Language Resources Association"},{"key":"2020110613011387700_ocv037-B27","article-title":"Deriving an English biomedical silver standard corpus for CLEF-ER. Conference and Labs of the Evaluation Forum (CLEF) 2013. CLEF-ER working notes.","author":"Lewin"},{"key":"2020110613011387700_ocv037-B28"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/22\/5\/948\/34146393\/ocv037.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/22\/5\/948\/34146393\/ocv037.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T18:47:42Z","timestamp":1604688462000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/22\/5\/948\/930067"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,5,5]]},"references-count":28,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2015,5,5]]},"published-print":{"date-parts":[[2015,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocv037","relation":{},"ISSN":["1527-974X","1067-5027"],"issn-type":[{"value":"1527-974X","type":"electronic"},{"value":"1067-5027","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,9]]},"published":{"date-parts":[[2015,5,5]]}}}