{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,23]],"date-time":"2025-10-23T16:33:44Z","timestamp":1761237224926},"reference-count":18,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2005,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene\/protein name recognition.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features \"internal resource features\", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-6-s1-s8","type":"journal-article","created":{"date-parts":[[2005,5,24]],"date-time":"2005-05-24T18:13:44Z","timestamp":1116958424000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":33,"title":["Gene\/protein name recognition based on support vector machine using dictionary as features"],"prefix":"10.1186","volume":"6","author":[{"given":"Tomohiro","family":"Mitsumori","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sevrani","family":"Fation","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masaki","family":"Murata","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kouichi","family":"Doi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hirohumi","family":"Doi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2005,5,24]]},"reference":[{"issue":"2","key":"643_CR1","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1093\/bioinformatics\/17.2.155","volume":"17","author":"T Ono","year":"2001","unstructured":"Ono T, Hishigaki H, Tanigami A, Takagi T: Automatic extraction of information on protein-protein interactions from biomedical literature. Bioinformatics 2001, 17(2):155\u2013161. 10.1093\/bioinformatics\/17.2.155","journal-title":"Bioinformatics"},{"key":"643_CR2","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1002\/cfg.91","volume":"2","author":"C Blaschke","year":"2001","unstructured":"Blaschke C, Valencia A: Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study. Comparative and Functional Genomics 2001, 2: 196\u2013206. 10.1002\/cfg.91","journal-title":"Comparative and Functional Genomics"},{"issue":"16","key":"643_CR3","doi-asserted-by":"publisher","first-page":"2046","DOI":"10.1093\/bioinformatics\/btg279","volume":"19","author":"JM Temkin","year":"2003","unstructured":"Temkin JM, Gilder MR: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 2003, 19(16):2046\u20132053. 10.1093\/bioinformatics\/btg279","journal-title":"Bioinformatics"},{"key":"643_CR4","first-page":"707","volume-title":"Proceedings of the Pacific Symposium on Biocomputing","author":"K Fukuda","year":"1998","unstructured":"Fukuda K, Tamura A, Tsunoda T, Takagi T: Toward Information Extraction: Identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing 1998, 707\u2013718."},{"key":"643_CR5","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/S1386-5056(02)00052-7","volume":"67","author":"K Franz\u00e9n","year":"2002","unstructured":"Franz\u00e9n K, Eriksson G, Asker FOL, Lid\u00e9n P, C\u00f6ster J: Protein names and how to find them. International Journal of Medical Informatics 2002, 67: 49\u201361. 10.1016\/S1386-5056(02)00052-7","journal-title":"International Journal of Medical Informatics"},{"key":"643_CR6","doi-asserted-by":"publisher","first-page":"201","DOI":"10.3115\/990820.990850","volume-title":"Proceedings of the 18th International Conference on Computational Linguistics (COLING'2000)","author":"N Collier","year":"2000","unstructured":"Collier N, Nobata C, Tsujii J: Extracting the Names of Genes and Gene Production with a Hidden Marcov Model. Proceedings of the 18th International Conference on Computational Linguistics (COLING'2000) 2000, 201\u2013207."},{"key":"643_CR7","doi-asserted-by":"publisher","first-page":"49","DOI":"10.3115\/1118958.1118965","volume-title":"Proceedings of the ACL 2003 Workshop on NLP in Biomedicine","author":"D Shen","year":"2003","unstructured":"Shen D, Zhang J, Zhou G, Su J, Tan CL: Effective Adaptation of a Hidden Marcov Model-based Named Entity Recognizer for Biomedical Domain. Proceedings of the ACL 2003 Workshop on NLP in Biomedicine 2003, 49\u201356."},{"key":"643_CR8","first-page":"1","volume-title":"Proceedings of the Natural Language Processing in the Biomedical Domain (ACL2002)","author":"J Kazama","year":"2002","unstructured":"Kazama J, Makino T, Ohta Y, Tsujii J: Tuning Support Vector Machines for Biomedical Named Entity Recognition. Proceedings of the Natural Language Processing in the Biomedical Domain (ACL2002) 2002, 1\u20138."},{"key":"643_CR9","doi-asserted-by":"publisher","first-page":"33","DOI":"10.3115\/1118958.1118963","volume-title":"Proceedings of the ACL 2003 Workshop on NLP in Biomedicine","author":"KJ Lee","year":"2003","unstructured":"Lee KJ, Hwang YS, Rim HC: Two-Phase Biomedical NE Recognition based on SVMs. Proceedings of the ACL 2003 Workshop on NLP in Biomedicine 2003, 33\u201340."},{"key":"643_CR10","doi-asserted-by":"publisher","first-page":"57","DOI":"10.3115\/1118958.1118966","volume-title":"Proceedings of the ACL 2003 Workshop on NLP in Biomedicine","author":"K Takeuchi","year":"2003","unstructured":"Takeuchi K, Collier N: Bio-Medical Entity Extraction using Support Vector Machine. Proceedings of the ACL 2003 Workshop on NLP in Biomedicine 2003, 57\u201364."},{"key":"643_CR11","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1007\/978-3-540-24630-5_21","volume-title":"Proceedings of Computational Linguistics and Intelligent Text Processing (CICLing 2004)","author":"T Mitsumori","year":"2004","unstructured":"Mitsumori T, Fation S, Murata M, Doi K, Doi H: Boundary Correction of Protein Names Adapting Heuristic Rules. Proceedings of Computational Linguistics and Intelligent Text Processing (CICLing 2004) 2004, 172\u2013175."},{"key":"643_CR12","doi-asserted-by":"publisher","first-page":"41","DOI":"10.3115\/1118958.1118964","volume-title":"Proceedings of the ACL 2003 Workshop on NLP in Biomedicine","author":"Y Tsuruoka","year":"2003","unstructured":"Tsuruoka Y, Tsujii J: Boosting Precision and Recall of Dictionary-Based Protein Name Recognition. Proceedings of the ACL 2003 Workshop on NLP in Biomedicine 2003, 41\u201348."},{"key":"643_CR13","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1093\/nar\/gkg095","volume":"31","author":"B Boeckmann","year":"2003","unstructured":"Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TeEMBL in 2003. Nucleic Acids Research 2003, 31: 365\u2013370. 10.1093\/nar\/gkg095","journal-title":"Nucleic Acids Research"},{"key":"643_CR14","first-page":"192","volume-title":"Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics(NAACL)","author":"T Kudo","year":"2001","unstructured":"Kudo T, Matsumoto Y: Chunking with Support Vector Machines. Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics(NAACL) 2001, 192\u2013199."},{"key":"643_CR15","first-page":"722","volume-title":"Proceedings of the National Conference on Artificial Intelligence AAAI Press","author":"E Brill","year":"1994","unstructured":"Brill E: Some advances in transformation-based part of speech tagging. Proceedings of the National Conference on Artificial Intelligence AAAI Press 1994, 722\u2013727."},{"key":"643_CR16","first-page":"947","volume-title":"18th International Conference on Computational Linguistics (COLING 2000)","author":"A Yeh","year":"2000","unstructured":"Yeh A: More accurate tests for the statistical significance of result differences. 18th International Conference on Computational Linguistics (COLING 2000) 2000, 947\u2013953."},{"key":"643_CR17","volume-title":"Proceedings of the Human Language Technology Conference (HLT 2002)","author":"T Ohta","year":"2002","unstructured":"Ohta T, Tateisi Y, Kim JD: The GENIA Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain. Proceedings of the Human Language Technology Conference (HLT 2002) 2002."},{"key":"643_CR18","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning Theory","author":"VN Vapnik","year":"1995","unstructured":"Vapnik VN: The Nature of Statistical Learning Theory. Springer; 1995."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S1-S8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T10:12:37Z","timestamp":1630491157000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-S1-S8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5]]},"references-count":18,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2005,5]]}},"alternative-id":["643"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-s1-s8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,5]]},"assertion":[{"value":"24 May 2005","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S8"}}