{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,1,16]],"date-time":"2024-01-16T18:24:35Z","timestamp":1705429475357},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs).<\/jats:p>\n               <jats:p>Results: We present Totum, a machine learning solution that harmonizes gene\/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving \u224870%) in exact alignment and 22% (achieving \u224882%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts.<\/jats:p>\n               <jats:p>Availability and implementation: Totum is implemented in Java and its resources are available at http:\/\/bioinformatics.ua.pt\/totum<\/jats:p>\n               <jats:p>Contact: \u00a0david.campos@ua.pt; rebholz@ebi.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts125","type":"journal-article","created":{"date-parts":[[2012,3,15]],"date-time":"2012-03-15T05:33:56Z","timestamp":1331789636000},"page":"1253-1261","source":"Crossref","is-referenced-by-count":11,"title":["Harmonization of gene\/protein annotations: towards a gold standard MEDLINE"],"prefix":"10.1093","volume":"28","author":[{"given":"David","family":"Campos","sequence":"first","affiliation":[{"name":"1 University of Aveiro, IEETA\/DETI, Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal and 2European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK"}]},{"given":"S\u00e9rgio","family":"Matos","sequence":"additional","affiliation":[{"name":"1 University of Aveiro, IEETA\/DETI, Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal and 2European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK"}]},{"given":"Ian","family":"Lewin","sequence":"additional","affiliation":[{"name":"1 University of Aveiro, IEETA\/DETI, Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal and 2European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK"}]},{"given":"Jos\u00e9 Lu\u00eds","family":"Oliveira","sequence":"additional","affiliation":[{"name":"1 University of Aveiro, IEETA\/DETI, Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal and 2European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK"}]},{"given":"Dietrich","family":"Rebholz-Schuhmann","sequence":"additional","affiliation":[{"name":"1 University of Aveiro, IEETA\/DETI, Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal and 2European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK"}]}],"member":"286","published-online":{"date-parts":[[2012,3,13]]},"reference":[{"key":"2023012512233372600_B1","first-page":"101","article-title":"BioCreative II gene mention tagging system at IBM Watson","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop.","author":"Ando","year":"2007"},{"key":"2023012512233372600_B2","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-59745-535-0_4","article-title":"UniProtKB\/Swiss-Prot","volume-title":"Plant Bioinformatics: Methods and Protocols (Series: Methods in Molecular Biology)","author":"Boutet","year":"2007"},{"key":"2023012512233372600_B3","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.artmed.2004.07.016","article-title":"Comparative experiments on learning information extractors for proteins and their interactions","volume":"33","author":"Bunescu","year":"2005","journal-title":"Artif. Intell. Med."},{"key":"2023012512233372600_B4","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1023\/A:1009715923555","article-title":"A tutorial on support vector machines for pattern recognition","volume":"2","author":"Burges","year":"1998","journal-title":"Data Min. Knowl. Disc."},{"issue":"Suppl. 1","key":"2023012512233372600_B5","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/1471-2105-6-S1-S12","article-title":"Data preparation and interannotator agreement: BioCreAtIvE task 1B","volume":"6","author":"Colosimo","year":"2005","journal-title":"BMC Bioinformatics"},{"issue":"Suppl. 1","key":"2023012512233372600_B6","first-page":"D344","article-title":"Chebi: a database and ontology for chemical entities of biological interest","volume":"36","author":"Degtyarenko","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012512233372600_B7","first-page":"28","article-title":"Semantic annotations for biology\u2014a corpus development initiative at the Jena University Language & Information Engineering (JULIE) Lab","volume-title":"LREC 2008\u2013Proceedings of the 6th International Conference on Language Resources and Evaluation.","author":"Hahn","year":"2008"},{"issue":"Suppl. 1","key":"2023012512233372600_B8","first-page":"D514","article-title":"Online mendelian inheritance in man (omim), a knowledgebase of human genes and genetic disorders","volume":"33","author":"Hamosh","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012512233372600_B9","doi-asserted-by":"crossref","first-page":"i286","DOI":"10.1093\/bioinformatics\/btn183","article-title":"Integrating high dimensional bi-directional parsing models for gene mention tagging","volume":"24","author":"Hsu","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512233372600_B10","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1093\/bioinformatics\/btg1023","article-title":"GENIA corpus\u2013a semantically annotated corpus for bio-textmining","volume":"19","author":"Kim","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012512233372600_B11","first-page":"70","article-title":"Introduction to the bio-entity recognition task at JNLPBA","volume-title":"Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications.","author":"Kim","year":"2004"},{"key":"2023012512233372600_B12","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1016\/j.ijmedinf.2005.06.011","article-title":"Distributed modules for text annotation and IE applied to the biomedical domain","volume":"75","author":"Kirsch","year":"2006","journal-title":"Int. J. Med. Inform."},{"key":"2023012512233372600_B13","article-title":"Integrated annotation for biomedical information extraction","volume-title":"Proceedings of the Human Language Technology Conference and the Annual Meeting of the North American Chapter of the Association for Computational Linguistics (HLT\/NAACL)","author":"Kulick","year":"2004"},{"key":"2023012512233372600_B14","first-page":"105","article-title":"Rich feature set, unification of bidirectional parsing and dictionary filtering for high F-score gene mention tagging","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop.","author":"Kuo","year":"2007"},{"key":"2023012512233372600_B15","article-title":"Conditional random fields: probabilistic models for segmenting and labeling sequence data","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001).","author":"Lafferty","year":"2001"},{"key":"2023012512233372600_B16","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1093\/nar\/26.1.94","article-title":"Gdb: the human genome database","volume":"26","author":"Letovsky","year":"1998","journal-title":"Nucleic Acids Res."},{"key":"2023012512233372600_B17","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1093\/bioinformatics\/bti749","article-title":"Biothesaurus: a web-based thesaurus of protein and gene names","volume":"22","author":"Liu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512233372600_B18","first-page":"1","article-title":"Integrating divergent models for gene mention tagging","volume-title":"IEEE International Conference on Natural Language Processing and Knowledge Engineering, 2009 (NLP-KE 2009)","author":"Li","year":"2009"},{"issue":"Suppl. 1","key":"2023012512233372600_B19","first-page":"D54","article-title":"Entrez gene: gene-centered information at NCBI","volume":"33","author":"Maglott","year":"2005","journal-title":"Nucleic Acids Research"},{"key":"2023012512233372600_B20","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1002\/cfg.452","article-title":"Protein name tagging guidelines: lessons learned","volume":"6","author":"Mani","year":"2005","journal-title":"Comp. Funct. Genom."},{"key":"2023012512233372600_B21","author":"McCallum","year":"2002","journal-title":"MALLET: A Machine Learning for Language Toolkit."},{"issue":"Suppl. 1","key":"2023012512233372600_B22","doi-asserted-by":"crossref","first-page":"i241","DOI":"10.1093\/bioinformatics\/bth904","article-title":"Protein names precisely peeled off free text","volume":"20","author":"Mika","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012512233372600_B23","article-title":"IeXML: towards an annotation framework for biomedical semantic types enabling interoperability of text processing modules","volume-title":"Proceedings of BioLink, ISMB 2006.","author":"Rebholz\u2013Schuhmann","year":"2006"},{"key":"2023012512233372600_B24","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1142\/S0219720010004562","article-title":"CALBC silver standard corpus","volume":"8","author":"Rebholz-Schuhmann","year":"2010","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023012512233372600_B25","doi-asserted-by":"crossref","first-page":"142","DOI":"10.3115\/1119176.1119195","article-title":"Introduction to the CoNLL-2003 shared task: language-independent named entity recognition","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003.","author":"Sang","year":"2003"},{"key":"2023012512233372600_B26","doi-asserted-by":"crossref","first-page":"3191","DOI":"10.1093\/bioinformatics\/bti475","article-title":"Abner: an open source tool for automatically tagging genes, proteins and other entity names in text","volume":"21","author":"Settles","year":"2005","journal-title":"Bioinformatics"},{"issue":"Suppl. 2","key":"2023012512233372600_B27","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/gb-2008-9-s2-s2","article-title":"Overview of BioCreative II gene mention recognition","volume":"9","author":"Smith","year":"2008","journal-title":"Genome Biol."},{"key":"2023012512233372600_B28","article-title":"An Introduction to Conditional Random Fields for Relational Learing","volume-title":"Introduction to Statistical Relational Learing.","author":"Sutton","year":"2006"},{"issue":"Suppl. 1","key":"2023012512233372600_B29","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2105-6-S1-S3","article-title":"GENETAG: a tagged corpus for gene\/protein named entity recognition","volume":"6","author":"Tanabe","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012512233372600_B30","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1197\/jamia.M2844","article-title":"BioTagger-GM: a gene\/protein name recognition system","volume":"16","author":"Torii","year":"2009","journal-title":"J. Am. Med. Inform. Assoc."},{"issue":"Suppl. 1","key":"2023012512233372600_B31","doi-asserted-by":"crossref","first-page":"D255","DOI":"10.1093\/nar\/gkh072","article-title":"Genew: the human gene nomenclature database, 2004 updates","volume":"32","author":"Wain","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012512233372600_B32","article-title":"Conditional random fields: an introduction","volume-title":"Rapport technique MS-CIS-04-21","author":"Wallach","year":"2004"},{"key":"2023012512233372600_B33","first-page":"7","article-title":"Biocreative 2. Gene mention task","volume-title":"Proceedings of the Second Biocreative Challenge Evaluation Workshop","author":"Wilbur","year":"2007"},{"key":"2023012512233372600_B34","doi-asserted-by":"crossref","first-page":"1178","DOI":"10.1093\/bioinformatics\/bth060","article-title":"Recognizing names in biomedical texts: a machine learning approach","volume":"20","author":"Zhou","year":"2004","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/9\/1253\/48879521\/bioinformatics_28_9_1253.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/9\/1253\/48879521\/bioinformatics_28_9_1253.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T15:54:01Z","timestamp":1674662041000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/9\/1253\/312384"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3,13]]},"references-count":34,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2012,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts125","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,5,1]]},"published":{"date-parts":[[2012,3,13]]}}}