{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T15:18:30Z","timestamp":1759936710353},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":3015,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Tagging gene and gene product mentions in scientific text is an important initial step of literature mining. In this article, we describe in detail our gene mention tagger participated in BioCreative 2 challenge and analyze what contributes to its good performance. Our tagger is based on the conditional random fields model (CRF), the most prevailing method for the gene mention tagging task in BioCreative 2. Our tagger is interesting because it accomplished the highest F-scores among CRF-based methods and second over all. Moreover, we obtained our results by mostly applying open source packages, making it easy to duplicate our results.<\/jats:p>\n               <jats:p>Results: We first describe in detail how we developed our CRF-based tagger. We designed a very high dimensional feature set that includes most of information that may be relevant. We trained bi-directional CRF models with the same set of features, one applies forward parsing and the other backward, and integrated two models based on the output scores and dictionary filtering. One of the most prominent factors that contributes to the good performance of our tagger is the integration of an additional backward parsing model. However, from the definition of CRF, it appears that a CRF model is symmetric and bi-directional parsing models will produce the same results. We show that due to different feature settings, a CRF model can be asymmetric and the feature setting for our tagger in BioCreative 2 not only produces different results but also gives backward parsing models slight but constant advantage over forward parsing model. To fully explore the potential of integrating bi-directional parsing models, we applied different asymmetric feature settings to generate many bi-directional parsing models and integrate them based on the output scores. Experimental results show that this integrated model can achieve even higher F-score solely based on the training corpus for gene mention tagging.<\/jats:p>\n               <jats:p>Availability: Data sets, programs and an on-line service of our gene mention tagger can be accessed at http:\/\/aiia.iis.sinica.edu.tw\/biocreative2.htm<\/jats:p>\n               <jats:p>Contact: \u00a0chunnan@iis.sinica.edu.tw<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn183","type":"journal-article","created":{"date-parts":[[2008,6,27]],"date-time":"2008-06-27T07:43:13Z","timestamp":1214552593000},"page":"i286-i294","source":"Crossref","is-referenced-by-count":60,"title":["Integrating high dimensional bi-directional parsing models for gene mention tagging"],"prefix":"10.1093","volume":"24","author":[{"given":"Chun-Nan","family":"Hsu","sequence":"first","affiliation":[{"name":"1 Institute of Information Science, Academia Sinica and 2Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu-Ming","family":"Chang","sequence":"additional","affiliation":[{"name":"1 Institute of Information Science, Academia Sinica and 2Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cheng-Ju","family":"Kuo","sequence":"additional","affiliation":[{"name":"1 Institute of Information Science, Academia Sinica and 2Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan"},{"name":"1 Institute of Information Science, Academia Sinica and 2Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu-Shi","family":"Lin","sequence":"additional","affiliation":[{"name":"1 Institute of Information Science, Academia Sinica and 2Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Han-Shen","family":"Huang","sequence":"additional","affiliation":[{"name":"1 Institute of Information Science, Academia Sinica and 2Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"I-Fang","family":"Chung","sequence":"additional","affiliation":[{"name":"1 Institute of Information Science, Academia Sinica and 2Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2008,7,1]]},"reference":[{"key":"2023020210401418100_B1","first-page":"101","article-title":"Biocreative ii gene mention tagging system at IBM watson","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"Ando","year":"2007"},{"key":"2023020210401418100_B2","first-page":"92","article-title":"Combining labeled and unlabeled data with co-training","author":"Blum","year":"1998"},{"key":"2023020210401418100_B3","doi-asserted-by":"crossref","first-page":"1470","DOI":"10.1214\/aoms\/1177692379","article-title":"Generalized iterative scaling for log-linear models","volume":"43","author":"Darroch","year":"1972","journal-title":"Ann. Math. Stat"},{"key":"2023020210401418100_B4","doi-asserted-by":"crossref","first-page":"D319","DOI":"10.1093\/nar\/gkj147","article-title":"The HUGO gene nomenclature database, 2006 updates","volume":"34","author":"Eyre","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020210401418100_B5","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-6-S1-S5","article-title":"Exploring the boundaries: gene and protein identification in biomedical text","volume":"6","author":"Finkel","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210401418100_B6","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"Hirschman","year":"2007"},{"key":"2023020210401418100_B7","article-title":"Global and componentwise extrapolations for accelerating training of Bayesian networks and conditional random fields","volume-title":"Technical Report TR-IIS-07-013","author":"Hsu","year":"2007"},{"key":"2023020210401418100_B8","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1109\/3477.990870","article-title":"Bayesian classification for data from the same unknonw class","volume":"32","author":"Huang","year":"2002","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics Part B"},{"key":"2023020210401418100_B9","first-page":"109","article-title":"High-recall gene mention recognition by unification of multiple backward parsing models","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"Huang","year":"2007"},{"key":"2023020210401418100_B10","first-page":"105","article-title":"Named entity recognition with combinations of conditional random fields","author":"Klinger","year":"2007"},{"key":"2023020210401418100_B11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3115\/1073336.1073361","article-title":". Chunking with support vector machines","volume-title":"NAACL'01: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001","author":"Kudo","year":"2001"},{"key":"2023020210401418100_B12","unstructured":"Kudo\n              T\n            \n          \n          Crf++: yet another crf toolkit\n          2005\n          \n            http:\/\/crfpp.sourceforge.net\/"},{"key":"2023020210401418100_B13","first-page":"105","article-title":"Rich feature set, unification of bidirectional parsing and dictionary filtering for high f-score gene mention tagging","author":"Kuo","year":"2007"},{"key":"2023020210401418100_B14","first-page":"282","article-title":"Conditional random fields: probabilistic models for segmenting and labeling sequence data","author":"Lafferty","year":"2001"},{"key":"2023020210401418100_B15","unstructured":"McCallum\n              AK\n            \n          \n          Mallet: a machine learning for language toolkit\n          2002\n          \n            http:\/\/mallet.cs.umass.edu. (last accessed date May 18, 2008)"},{"key":"2023020210401418100_B16","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1186\/1471-2105-6-S1-S6","article-title":"Identifying gene and protein mentions in text using conditional random fields","volume":"6","author":"McDonald","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210401418100_B17","doi-asserted-by":"crossref","first-page":"S8","DOI":"10.1186\/1471-2105-6-S1-S8","article-title":"Gene\/protein name recognition based on support vector machine using dictionary as features","volume":"6","author":"Mitsumori","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210401418100_B18","doi-asserted-by":"crossref","DOI":"10.1007\/b98874","volume-title":"Numerical Optimization","author":"Nocedal","year":"1999"},{"key":"2023020210401418100_B19","doi-asserted-by":"crossref","unstructured":"Sha\n              F\n            \n            \u00a0PereiraF\n          Shallow parsing with conditional random fields\n          Proceedings of Human Language Technology, the North American Chapter of the Association for Computational Linguistics (NAACL'03)\n          2003\n          213\n          220\n          Edmonton, Canada. Association for Computational Linguistics, Morristown, NJ, USA \n          10.3115\/1073445.1073473","DOI":"10.3115\/1073445.1073473"},{"key":"2023020210401418100_B20","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1007\/11573036_36","article-title":"Developing a robust part-of-speech tagger for biomedical text","volume-title":"Advances in Informatics, Lecture Notes in Computer Science","author":"Tsuruoka","year":"2005"},{"key":"2023020210401418100_B21","first-page":"7","article-title":"Biocreative 2. gene mention task","author":"Wilbur","year":"2007"},{"key":"2023020210401418100_B22","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1016\/j.ijmedinf.2005.06.012","article-title":"Recognizing names in biomedical texts using mutual information independence model and svm plus sigmoid","volume":"75","author":"Zhou","year":"2006","journal-title":"Int. J. Med. Inform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i286\/49053238\/bioinformatics_24_13_i286.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/i286\/49053238\/bioinformatics_24_13_i286.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T12:24:40Z","timestamp":1675340680000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/13\/i286\/236147"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7,1]]},"references-count":22,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2008,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn183","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,7,1]]},"published":{"date-parts":[[2008,7,1]]}}}