{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T11:11:10Z","timestamp":1770721870223,"version":"3.49.0"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"S1","license":[{"start":{"date-parts":[[2015,1,19]],"date-time":"2015-01-19T00:00:00Z","timestamp":1421625600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2015,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature.<\/jats:p>\n            <jats:p>We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface.<\/jats:p>\n            <jats:p>BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/bitbucket.org\/tsendeemts\/banner-chemdner\" ext-link-type=\"uri\">https:\/\/bitbucket.org\/tsendeemts\/banner-chemdner<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1758-2946-7-s1-s9","type":"journal-article","created":{"date-parts":[[2015,6,18]],"date-time":"2015-06-18T09:39:15Z","timestamp":1434620355000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":28,"title":["Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations"],"prefix":"10.1186","volume":"7","author":[{"given":"Tsendsuren","family":"Munkhdalai","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Meijing","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Khuyagbaatar","family":"Batsuren","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hyeon Ah","family":"Park","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nak Hyeon","family":"Choi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Keun Ho","family":"Ryu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2015,1,19]]},"reference":[{"key":"628_CR1","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1007\/s11390-010-9313-5","volume":"25","author":"HJ Dai","year":"2010","unstructured":"Dai HJ, Chang YC, Tsai RTH, Hsu WL: New Challenges for Biological Text-Mining in the Next Decade. Journal of computer science and technology. 2010, 25: 169-179. 10.1007\/s11390-010-9313-5.","journal-title":"Journal of computer science and technology"},{"key":"628_CR2","doi-asserted-by":"publisher","first-page":"1633","DOI":"10.1093\/bioinformatics\/bts183","volume":"28","author":"T Rockt\u00e4schel","year":"2012","unstructured":"Rockt\u00e4schel T, Weidlich M, Leser U: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics. 2012, 28: 1633-1640. 10.1093\/bioinformatics\/bts183.","journal-title":"Bioinformatics"},{"key":"628_CR3","doi-asserted-by":"publisher","first-page":"2983","DOI":"10.1093\/bioinformatics\/btp535","volume":"25","author":"KM Hettne","year":"2009","unstructured":"Hettne KM, Stierum RH, Schuemie MJ, Hendriksen PJM, Schijvenaars BJA, Mulligen EMV, Kleinjans J, Kors JA: A dictionary to identify small molecules and drugs in free text. Bioinformatics. 2009, 25: 2983-2991. 10.1093\/bioinformatics\/btp535.","journal-title":"Bioinformatics"},{"key":"628_CR4","doi-asserted-by":"publisher","first-page":"816","DOI":"10.1016\/j.drudis.2008.06.001","volume":"13","author":"I Segura-Bedmar","year":"2008","unstructured":"Segura-Bedmar I, Mart\u00ednez P, Segura-Bedmar M: Drug name recognition and classification in biomedical texts: A case study outlining approaches underpinning automated systems. Drug Discovery Today. 2008, 13: 816-823. 10.1016\/j.drudis.2008.06.001.","journal-title":"Drug Discovery Today"},{"key":"628_CR5","first-page":"84","volume-title":"Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"S Zhao","year":"2004","unstructured":"Zhao S: Named Entity Recognition in Biomedical Texts using an HMM Model. Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Edited by: Nigel C. National Institute of Informatics, Patrick R. University Hospital of Geneva and EPFL, Adeline N. LIPN. 2004, 84-87."},{"key":"628_CR6","first-page":"96","volume-title":"Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"G Zhou","year":"2004","unstructured":"Zhou G, Su J: Exploring deep knowledge resources in biomedical name recognition. Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Edited by: Nigel C. National Institute of Informatics, Patrick R. University Hospital of Geneva and EPFL, Adeline N. LIPN. 2004, 96-99."},{"key":"628_CR7","doi-asserted-by":"publisher","first-page":"S8","DOI":"10.1186\/1471-2105-6-S1-S8","volume":"6","author":"T Mitsumori","year":"2005","unstructured":"Mitsumori T, Fation S, Murata M, Doi K, Doi H: Gene\/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics. 2005, 6: S8-","journal-title":"BMC Bioinformatics"},{"key":"628_CR8","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1016\/j.jbi.2004.08.008","volume":"37","author":"N Collier","year":"2004","unstructured":"Collier N, Takeuchi K: Comparison of character-level and part of speech features for name recognition in biomedical texts. Journal of Biomedical Informatics. 2004, 37: 423-35. 10.1016\/j.jbi.2004.08.008.","journal-title":"Journal of Biomedical Informatics"},{"key":"628_CR9","first-page":"88","volume-title":"Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"J Finkel","year":"2004","unstructured":"Finkel J, Dingare S, Nguyen H, Nissim M, Manning C, Sinclair G: Exploiting context for biomedical entity recognition: from syntax to the web. Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Edited by: Nigel C. National Institute of Informatics, Patrick R. University Hospital of Geneva and EPFL, Adeline N. LIPN. 2004, 88-91."},{"issue":"Suppl 11","key":"628_CR10","doi-asserted-by":"publisher","first-page":"S4","DOI":"10.1186\/1471-2105-9-S11-S4","volume":"9","author":"P Corbett","year":"2008","unstructured":"Corbett P, Copestake A: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinformatics. 2008, 9 (Suppl 11): S4-10.1186\/1471-2105-9-S11-S4.","journal-title":"BMC Bioinformatics"},{"key":"628_CR11","first-page":"104","volume-title":"Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"B Settles","year":"2004","unstructured":"Settles B: Biomedical named entity recognition using conditional random fields and rich feature sets. Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Edited by: Nigel C. National Institute of Informatics, Patrick R. University Hospital of Geneva and EPFL, Adeline N. LIPN. 2004, 104-107."},{"key":"628_CR12","doi-asserted-by":"publisher","first-page":"i286","DOI":"10.1093\/bioinformatics\/btn183","volume":"24","author":"C Hsu","year":"2008","unstructured":"Hsu C, Chang Y, Kuo C, Lin Y, Huang H, Chung I: Integrating high dimensional bi-directional parsing models for gene mention tagging. Bioinformatics. 2008, 24: i286-i294. 10.1093\/bioinformatics\/btn183.","journal-title":"Bioinformatics"},{"key":"628_CR13","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1186\/1471-2105-10-223","volume":"10","author":"Y Li","year":"2009","unstructured":"Li Y, Lin H, Yang Z: Incorporating rich background knowledge for gene named entity classification and recognition. BMC Bioinformatics. 2009, 10: 223-10.1186\/1471-2105-10-223.","journal-title":"BMC Bioinformatics"},{"key":"628_CR14","first-page":"857","volume-title":"Proceedings 26th International Conference on Advanced Information Networking and Applications Workshops","author":"T Munkhdalai","year":"2012","unstructured":"Munkhdalai T, Li M, Kim T, Namsrai O, Jeong S, Shin J, Ryu KH: Bio Named Entity Recognition based on Co-training Algorithm. Proceedings 26th International Conference on Advanced Information Networking and Applications Workshops. Edited by: Leonard B. Fukuoka Institute of Technology, Tomoya E. Rissho University, Fatos X. Technical University of Catalonia, Makoto T. Seikei University. 2012, 857-862."},{"key":"628_CR15","doi-asserted-by":"publisher","first-page":"575","DOI":"10.3745\/JIPS.2012.8.4.575","volume":"8","author":"T Munkhdalai","year":"2012","unstructured":"Munkhdalai T, Li M, Yun U, Namsrai O, Ryu KH: An Active Co-Training Algorithm for Biomedical Named-Entity Recognition. Journal of Information Processing Systems. 2012, 8: 575-588. 10.3745\/JIPS.2012.8.4.575.","journal-title":"Journal of Information Processing Systems"},{"key":"628_CR16","first-page":"384","volume-title":"Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics","author":"J Turian","year":"2010","unstructured":"Turian J, Ratinov L, Bengio Y: Word representations: A simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010, 384-394."},{"key":"628_CR17","volume-title":"Proceedings of Workshop at ICLR","author":"T Mikolov","year":"2013","unstructured":"Mikolov T, Chen K, Corrado G, Dean J: Efficient Estimation of Word Representations in Vector Space. Proceedings of Workshop at ICLR. 2013"},{"key":"628_CR18","volume-title":"Proceedings of the 28th International Conference on Machine Learning","author":"R Socher","year":"2011","unstructured":"Socher R, Lin CC, Ng AY, Manning CD: Parsing Natural Scenes and Natural Language with Recursive Neural Networks. Proceedings of the 28th International Conference on Machine Learning. 2011"},{"key":"628_CR19","first-page":"S1","volume":"12","author":"CN Arighi","year":"2011","unstructured":"Arighi CN, Lu Z, Krallinger M, Cohen KB, Wilbur JW, Valencia A, Hirschman L, Wu CH: Overview of the BioCreative III Workshop. Bioinformatics. 2011, 12: S1-","journal-title":"Bioinformatics"},{"key":"628_CR20","first-page":"6","volume":"1","author":"CN Arighi","year":"2014","unstructured":"Arighi CN, Wu CH, Cohen KB, Hirschman L, Krallinger M, Valencia A, Lu Z, Wilbur JW, Wiegers TC: BioCreative-IV virtual issue. Database. 2014, 1: 6-","journal-title":"Database"},{"issue":"Suppl 2","key":"628_CR21","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1758-2946-7-S1-S2","volume":"7","author":"M Krallinger","year":"2015","unstructured":"Krallinger M, Rabal O, Leitner F, Vazquez M, Salgado D, Lu Z, Leaman R, Lu Y, Ji D, Lowe DM, Sayle RA, Batista-Navarro RT, Rak R, Huber T, Rockt\u00e4schel T, Matos S, Campos D, Tang B, Xu H, Munkhdalai T, Ryu KH, Ramanan SV, Nathan S, \u017ditnik S, Bajec M, Weber L, Irmer M, Akhondi SA, Kors JA, Xu S, An X, Sikdar UK, Ekbal A, Yoshioka M, Dieb TM, Choi M, Verspoor K, Khabsa M, Giles CL, Liu H, Ravikumar KE, Lamurias A, Couto FM, Dai HJ, Tzong-Han Tsai R, Ata C, Can T, Usi\u00e9 A, Alves R, Segura-Bedmar I, Mart\u00ednez P, Oyarzabal J, Valencia A: The CHEMDNER corpus of chemicals and drugs and its annotation principles. Journal of Cheminformatics. 2015, 7 (Suppl 2): S2-","journal-title":"Journal of Cheminformatics"},{"key":"628_CR22","first-page":"652","volume-title":"Proceedings of Pacific Symposium on Biocomputing","author":"R Leaman","year":"2008","unstructured":"Leaman R, Gonzalez G: BANNER: an executable survey of advances in biomedical named entity recognition. Proceedings of Pacific Symposium on Biocomputing. 2008, 652-663."},{"key":"628_CR23","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1093\/bioinformatics\/btt474","volume":"29","author":"R Leaman","year":"2013","unstructured":"Leaman R, Islamaj DR, Lu Z: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics. 2013, 29: 22-10.1093\/bioinformatics\/bts639.","journal-title":"Bioinformatics"},{"key":"628_CR24","first-page":"11","volume":"29","author":"CH Wei","year":"2013","unstructured":"Wei CH, Harris BR, Kao HY, Lu Z: tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics. 2013, 29: 11-","journal-title":"Bioinformatics"},{"key":"628_CR25","doi-asserted-by":"publisher","first-page":"i382","DOI":"10.1093\/bioinformatics\/btq180","volume":"26","author":"J Bj\u00f6rne","year":"2010","unstructured":"Bj\u00f6rne J, Ginter F, Pyysalo S, Tsujii J, Salakoski T: Complex event extraction at PubMed scale. Bioinformatics. 2010, 26: i382-i390. 10.1093\/bioinformatics\/btq180.","journal-title":"Bioinformatics"},{"key":"628_CR26","doi-asserted-by":"publisher","first-page":"2154","DOI":"10.1093\/bioinformatics\/bts332","volume":"28","author":"M Gerner","year":"2012","unstructured":"Gerner M, Sarafraz F, Bergman CM, Nenadic G: BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events. Bioinformatics. 2012, 28: 2154-2161. 10.1093\/bioinformatics\/bts332.","journal-title":"Bioinformatics"},{"key":"628_CR27","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/2041-1480-3-3","volume":"3","author":"H Liu","year":"2012","unstructured":"Liu H, Christiansen T, Baumgartner WA, Verspoor K: BioLemmatizer: a lemmatization tool for morphological processing of biomedical text. Journal of Biomedical Semantic. 2012, 3: 3-10.1186\/2041-1480-3-3.","journal-title":"Journal of Biomedical Semantic"},{"key":"628_CR28","first-page":"467","volume":"18","author":"PF Brown","year":"1992","unstructured":"Brown PF, deSouza PV, Mercer RL, Pietra VJD, Lai JC: Class-Based n-gram Models of Natural Language. Computational Linguistics. 1992, 18: 467-479.","journal-title":"Computational Linguistics"},{"issue":"Suppl 2","key":"628_CR29","doi-asserted-by":"publisher","first-page":"S1","DOI":"10.1186\/1758-2946-7-S1-S1","volume":"7","author":"M Krallinger","year":"2015","unstructured":"Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A: CHEMDNER: The drugs and chemical names extraction challenge. Journal of Cheminformatics. 2015, 7 (Suppl 2): S1-","journal-title":"Journal of Cheminformatics"},{"key":"628_CR30","unstructured":"MALLET: A Machine Learning for Language Toolkit. [http:\/\/mallet.cs.umass.edu\/]"},{"key":"628_CR31","first-page":"135","volume-title":"Proceedings of the Fourth BioCreative Challenge Evaluation Workshop","author":"T Munkhdalai","year":"2013","unstructured":"Munkhdalai T, Li M, Batsuren K, Ryu KH: BANNER-CHEMDNER: Incorporating Domain Knowledge in Chemical and Drug Named Entity Recognition. Proceedings of the Fourth BioCreative Challenge Evaluation Workshop. 2013, 135-139."},{"key":"628_CR32","first-page":"101","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"RK Ando","year":"2007","unstructured":"Ando RK: BioCreative II Gene Mention Tagging System at IBM Watson. Proceedings of the Second BioCreative Challenge Evaluation Workshop. 2007, 101-103."},{"key":"628_CR33","first-page":"105","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"CJ Kuo","year":"2007","unstructured":"Kuo CJ, Chang YM, Huang HS, Lin KT, Yang BH, Lin YS, Hsu CN, Chung IF: Rich feature set, unification of bidirectional parsing and dictionary filtering for high F-score gene mention tagging. Proceedings of the Second BioCreative Challenge Evaluation Workshop. 2007, 105-107."},{"key":"628_CR34","first-page":"109","volume-title":"Proceedings of the Second BioCreative Challenge Evaluation Workshop","author":"HS Huang","year":"2007","unstructured":"Huang HS, Lin YS, Lin KT, Kuo CJ, Chang YM, Yang BH, Chung IF, Hsu CN: High-recall gene mention recognition by unification of multiple backward parsing models. Proceedings of the Second BioCreative Challenge Evaluation Workshop. 2007, 109-111."}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-7-S1-S9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1758-2946-7-S1-S9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-7-S1-S9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T19:11:20Z","timestamp":1630609880000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/1758-2946-7-S1-S9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,19]]},"references-count":34,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2015,12]]}},"alternative-id":["628"],"URL":"https:\/\/doi.org\/10.1186\/1758-2946-7-s1-s9","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,1,19]]},"assertion":[{"value":"19 January 2015","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S9"}}