{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:41:36Z","timestamp":1760161296800},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"S1","license":[{"start":{"date-parts":[[2015,1,19]],"date-time":"2015-01-19T00:00:00Z","timestamp":1421625600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2015,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>We developed a CHEMDNER system based on mixed conditional random fields (CRF) with word clustering for chemical compound and drug name recognition. For the word clustering, we used Brown's hierarchical algorithm and Skip-gram model based on deep learning with massive PubMed articles including titles and abstracts.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>This system achieved the highest F-score of 88.20% for the CDI task and the second highest F-score of 87.11% for the CEM task in BioCreative IV. The performance was further improved by multi-scale clustering based on deep learning, achieving the F-score of 88.71% for CDI and 88.06% for CEM.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>The mixed CRF model represents both the internal complexity and external contexts of the entities, and the model is integrated with word clustering to capture domain knowledge with PubMed articles including titles and abstracts. The domain knowledge helps to ensure the performance of the entity recognition, even without fine-grained linguistic features and manually designed rules.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1758-2946-7-s1-s4","type":"journal-article","created":{"date-parts":[[2015,6,18]],"date-time":"2015-06-18T11:32:05Z","timestamp":1434627125000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":37,"title":["CHEMDNER system with mixed conditional random fields and multi-scale word clustering"],"prefix":"10.1186","volume":"7","author":[{"given":"Yanan","family":"Lu","sequence":"first","affiliation":[]},{"given":"Donghong","family":"Ji","sequence":"additional","affiliation":[]},{"given":"Xiaoyuan","family":"Yao","sequence":"additional","affiliation":[]},{"given":"Xiaomei","family":"Wei","sequence":"additional","affiliation":[]},{"given":"Xiaohui","family":"Liang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2015,1,19]]},"reference":[{"issue":"Suppl 2","key":"623_CR1","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/gb-2008-9-s2-s2","volume":"9","author":"L Smith","year":"2008","unstructured":"Smith L, Tanabe LK, Ando RJ, et al: Overview of BioCreative II gene mention recognition. Genome Biology. 2008, 9 (Suppl 2): S2-10.1186\/gb-2008-9-s2-s2.","journal-title":"Genome Biology"},{"key":"623_CR2","unstructured":"2013, [http:\/\/www.biocreative.org\/tasks\/biocreative-iv\/chemdner\/]"},{"issue":"Suppl 1","key":"623_CR3","doi-asserted-by":"publisher","first-page":"S1","DOI":"10.1186\/1758-2946-7-S1-S1","volume":"7","author":"M Krallinger","year":"2015","unstructured":"Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A: CHEMDNER: The drugs and chemical names extraction challenge. J Cheminform. 2015, 7 (Suppl 1): S1-","journal-title":"J Cheminform"},{"key":"623_CR4","first-page":"473","volume-title":"proceedings of the 40th Annual Meeting on Association for Computational Linguistics","author":"G Zhou","year":"2002","unstructured":"Zhou G, Su J: Named entity recognition using an HMM-based chunk tagger. proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002, 473-480."},{"key":"623_CR5","first-page":"1","volume":"1","author":"HL Chieu","year":"2002","unstructured":"Chieu HL, Ng HT: Named entity recognition: a maximum entropy approach using global information. Proceedings of the 19th international conference on Computational linguistics. Association for Computational Linguistics. 2002, 1: 1-7.","journal-title":"Proceedings of the 19th international conference on Computational linguistics. Association for Computational Linguistics"},{"key":"623_CR6","doi-asserted-by":"publisher","first-page":"188","DOI":"10.3115\/1119176.1119206","volume":"4","author":"A McCallum","year":"2003","unstructured":"McCallum A, Li W: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. Association for Computational Linguistics. 2003, 4: 188-191.","journal-title":"Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. Association for Computational Linguistics"},{"issue":"Suppl 1","key":"623_CR7","doi-asserted-by":"publisher","first-page":"S6","DOI":"10.1186\/1471-2105-6-S1-S6","volume":"6","author":"R McDonald","year":"2005","unstructured":"McDonald R, Pereira F: Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics. 2005, 6 (Suppl 1): S6-10.1186\/1471-2105-6-S1-S6.","journal-title":"BMC Bioinformatics"},{"key":"623_CR8","volume-title":"Conditional random fields: Probabilistic models for segmenting and labelling sequence data","author":"J Lafferty","year":"2001","unstructured":"Lafferty J, McCallum A, Pereira FC: Conditional random fields: Probabilistic models for segmenting and labelling sequence data. 2001"},{"key":"623_CR9","first-page":"337","volume":"4","author":"S Miller","year":"2004","unstructured":"Miller S, Guinness J, Zamanian A: Name Tagging with Word Clusters and Discriminative Training. HLT-NAACL. 2004, 4: 337-342.","journal-title":"HLT-NAACL"},{"issue":"4","key":"623_CR10","first-page":"467","volume":"18","author":"PF Brown","year":"1992","unstructured":"Brown PF, Desouza PV, Mercer RL, et al: Class-based n-gram models of natural language. Computational linguistics. 1992, 18 (4): 467-479.","journal-title":"Computational linguistics"},{"key":"623_CR11","first-page":"384","volume-title":"Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics","author":"J Turian","year":"2010","unstructured":"Turian J, Ratinov L, Bengio Y: Word representations: a simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 2010, 384-394."},{"key":"623_CR12","first-page":"119","volume":"23","author":"K Ganchev","year":"2007","unstructured":"Ganchev K, Crammer K, Pereira F, et al: Penn\/umass\/chop biocreative ii systems. Proceedings of the second biocreative challenge evaluation workshop. 2007, 23: 119-124.","journal-title":"Proceedings of the second biocreative challenge evaluation workshop"},{"key":"623_CR13","first-page":"3111","volume-title":"Advances in Neural Information Processing Systems","author":"T Mikolov","year":"2013","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems. 2013, 3111-3119."},{"key":"623_CR14","doi-asserted-by":"publisher","first-page":"142","DOI":"10.3115\/1119176.1119195","volume":"4","author":"EF Tjong Kim Sang","year":"2003","unstructured":"Tjong Kim Sang EF, De Meulder F: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. Association for Computational Linguistics. 2003, 4: 142-147.","journal-title":"Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. Association for Computational Linguistics"},{"issue":"Suppl 1","key":"623_CR15","doi-asserted-by":"publisher","first-page":"S5","DOI":"10.1186\/1471-2105-6-S1-S5","volume":"6","author":"J Finkel","year":"2005","unstructured":"Finkel J, Dingare S, Manning CD, Nissim M, Alex B, Grover C: Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics. 2005, 6 (Suppl 1): S5-10.1186\/1471-2105-6-S1-S5.","journal-title":"BMC Bioinformatics"},{"key":"623_CR16","first-page":"109","volume":"23","author":"HS Huang","year":"2007","unstructured":"Huang HS, Lin YS, Lin KT, et al: High-recall gene mention recognition by unification of multiple backward parsing models. Proceedings of the second BioCreative challenge evaluation workshop. 2007, 23: 109-111.","journal-title":"Proceedings of the second BioCreative challenge evaluation workshop"},{"key":"623_CR17","first-page":"70","volume-title":"Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics","author":"JD Kim","year":"2004","unstructured":"Kim JD, Ohta T, Tsuruoka Y, et al: Introduction to the bio-entity recognition task at JNLPBA. Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics. 2004, 70-75."},{"key":"623_CR18","volume-title":"Generalized expectation criteria for semi-supervised learning of conditional random fields","author":"GS Mann","year":"2008","unstructured":"Mann GS, McCallum A: Generalized expectation criteria for semi-supervised learning of conditional random fields. 2008"},{"issue":"6","key":"623_CR19","doi-asserted-by":"publisher","first-page":"965","DOI":"10.1109\/JSTSP.2010.2075990","volume":"4","author":"D Yu","year":"2010","unstructured":"Yu D, Wang S, Deng L: Sequential labeling using deep-structured conditional random fields. Selected Topics in Signal Processing, IEEE Journal. 2010, 4 (6): 965-973.","journal-title":"Selected Topics in Signal Processing, IEEE Journal"},{"key":"623_CR20","first-page":"2493","volume":"12","author":"R Collobert","year":"2011","unstructured":"Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P: Natural language processing (almost) from scratch. The Journal of Machine Learning Research. 2011, 12: 2493-2537.","journal-title":"The Journal of Machine Learning Research"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-7-S1-S4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1758-2946-7-S1-S4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-7-S1-S4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T19:29:01Z","timestamp":1630610941000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/1758-2946-7-S1-S4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,19]]},"references-count":20,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2015,12]]}},"alternative-id":["623"],"URL":"https:\/\/doi.org\/10.1186\/1758-2946-7-s1-s4","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,1,19]]},"assertion":[{"value":"19 January 2015","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S4"}}