{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T12:19:25Z","timestamp":1774700365578,"version":"3.50.1"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2013,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Named entity recognition (NER) is an important task in clinical natural language processing (NLP) research. Machine learning (ML) based NER methods have shown good performance in recognizing entities in clinical text. Algorithms and features are two important factors that largely affect the performance of ML-based NER systems. Conditional Random Fields (CRFs), a sequential labelling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to clinical NER tasks. For features, syntactic and semantic information of context words has often been used in clinical NER systems. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, and word representation features, which contain word-level back-off information over large unlabelled corpus by unsupervised algorithms, have not been extensively investigated for clinical text processing. Therefore, the primary goal of this study is to evaluate the use of SSVMs and word representation features in clinical NER tasks.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>In this study, we developed SSVMs-based NER systems to recognize clinical entities in hospital discharge summaries, using the data set from the concept extration task in the 2010 i2b2 NLP challenge. We compared the performance of CRFs and SSVMs-based NER classifiers with the same feature sets. Furthermore, we extracted two different types of word representation features (clustering-based representation features and distributional representation features) and integrated them with the SSVMs-based clinical NER system. We then reported the performance of SSVM-based NER systems with different types of word representation features.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results and discussion<\/jats:title>\n            <jats:p>Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER systems achieved better performance than the CRFs-based systems for clinical entity recognition, when same features were used. Both types of word representation features (clustering-based and distributional representations) improved the performance of ML-based NER systems. By combining two different types of word representation features together with SSVMs, our system achieved a highest F-measure of 85.82%, which outperformed the best system reported in the challenge by 0.6%. Our results show that SSVMs is a great potential algorithm for clinical NLP research, and both types of unsupervised word representation features are beneficial to clinical NER tasks.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1472-6947-13-s1-s1","type":"journal-article","created":{"date-parts":[[2013,4,5]],"date-time":"2013-04-05T10:20:06Z","timestamp":1365157206000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":73,"title":["Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features"],"prefix":"10.1186","volume":"13","author":[{"given":"Buzhou","family":"Tang","sequence":"first","affiliation":[]},{"given":"Hongxin","family":"Cao","sequence":"additional","affiliation":[]},{"given":"Yonghui","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Min","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Hua","family":"Xu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,4,5]]},"reference":[{"key":"648_CR1","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1136\/jamia.1994.95236146","volume":"1","author":"C Friedman","year":"1994","unstructured":"Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB: A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994, 1: 161-174. 10.1136\/jamia.1994.95236146.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR2","first-page":"128","volume-title":"Yearb Med Inform","author":"SM Meystre","year":"2008","unstructured":"Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008, 128-144."},{"key":"648_CR3","first-page":"284","volume-title":"Proc Annu Symp Comput Appl Med Care","author":"PJ Haug","year":"1995","unstructured":"Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, Huff SM: Experience with a mixed semantic\/syntactic parser. Proc Annu Symp Comput Appl Med Care. 1995, 284-288."},{"key":"648_CR4","first-page":"814","volume-title":"Proc AMIA Annu Fall Symp","author":"PJ Haug","year":"1997","unstructured":"Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K: A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp. 1997, 814-818."},{"key":"648_CR5","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1136\/jamia.2009.002733","volume":"17","author":"AR Aronson","year":"2010","unstructured":"Aronson AR, Lang FM: An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010, 17: 229-236.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR6","first-page":"156","volume-title":"AMIA Annu Symp Proc","author":"JC Denny","year":"2008","unstructured":"Denny JC, Miller RA, Johnson KB, Spickard A: Development and evaluation of a clinical note section header terminology. AMIA Annu Symp Proc. 2008, 156-160."},{"key":"648_CR7","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1136\/jamia.2009.001560","volume":"17","author":"GK Savova","year":"2010","unstructured":"Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010, 17: 507-513. 10.1136\/jamia.2009.001560.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR8","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1186\/1472-6947-6-30","volume":"6","author":"QT Zeng","year":"2006","unstructured":"Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R: Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak. 2006, 6: 30-10.1186\/1472-6947-6-30.","journal-title":"BMC Med Inform Decis Mak"},{"key":"648_CR9","doi-asserted-by":"publisher","first-page":"514","DOI":"10.1136\/jamia.2010.003947","volume":"17","author":"O Uzuner","year":"2010","unstructured":"Uzuner O, Solti I, Cadag E: Extracting medication information from clinical text. J Am Med Inform Assoc. 2010, 17: 514-518. 10.1136\/jamia.2010.003947.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR10","doi-asserted-by":"publisher","first-page":"552","DOI":"10.1136\/amiajnl-2011-000203","volume":"18","author":"O Uzuner","year":"2011","unstructured":"Uzuner O, South BR, Shen S, DuVall SL: 2010 i2b2\/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011, 18: 552-556. 10.1136\/amiajnl-2011-000203.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR11","doi-asserted-by":"publisher","first-page":"528","DOI":"10.1136\/jamia.2010.003855","volume":"17","author":"S Doan","year":"2010","unstructured":"Doan S, Bastarache L, Klimkowski S, Denny JC, Xu H: Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inform Assoc. 2010, 17: 528-531. 10.1136\/jamia.2010.003855.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR12","doi-asserted-by":"publisher","first-page":"532","DOI":"10.1136\/jamia.2010.003657","volume":"17","author":"I Spasic","year":"2010","unstructured":"Spasic I, Sarafraz F, Keane JA, Nenadic G: Medication information extraction with linguistic pattern matching and semantic rules. J Am Med Inform Assoc. 2010, 17: 532-535. 10.1136\/jamia.2010.003657.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR13","doi-asserted-by":"publisher","first-page":"524","DOI":"10.1136\/jamia.2010.003939","volume":"17","author":"J Patrick","year":"2010","unstructured":"Patrick J, Li M: High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge. J Am Med Inform Assoc. 2010, 17: 524-527. 10.1136\/jamia.2010.003939.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR14","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1136\/jamia.2010.004077","volume":"17","author":"Z Li","year":"2010","unstructured":"Li Z, Liu F, Antieau L, Cao Y, Yu H: Lancet: a high precision medication event extraction system for clinical text. J Am Med Inform Assoc. 2010, 17: 563-567. 10.1136\/jamia.2010.004077.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR15","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1136\/jamia.2010.004028","volume":"17","author":"SM Meystre","year":"2010","unstructured":"Meystre SM, Thibault J, Shen S, Hurdle JF, South BR: Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents. J Am Med Inform Assoc. 2010, 17: 559-562. 10.1136\/jamia.2010.004028.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR16","doi-asserted-by":"publisher","first-page":"557","DOI":"10.1136\/amiajnl-2011-000150","volume":"18","author":"B de Bruijn","year":"2011","unstructured":"de Bruijn B, Cherry C, Kiritchenko S, Martin J, Zhu X: Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc. 2011, 18: 557-562. 10.1136\/amiajnl-2011-000150.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR17","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1136\/amiajnl-2011-000163","volume":"18","author":"M Jiang","year":"2011","unstructured":"Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, Xu H: A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011, 18: 601-606. 10.1136\/amiajnl-2011-000163.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR18","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1136\/amiajnl-2011-000155","volume":"18","author":"M Torii","year":"2011","unstructured":"Torii M, Wagholikar K, Liu H: Using machine learning for concept extraction on clinical documents from multiple data sources. J Am Med Inform Assoc. 2011, 18: 580-587. 10.1136\/amiajnl-2011-000155.","journal-title":"J Am Med Inform Assoc"},{"key":"648_CR19","doi-asserted-by":"publisher","first-page":"94","DOI":"10.3115\/1572306.1572326","volume-title":"Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing","author":"D Li","year":"2008","unstructured":"Li D, Kipper-Schuler K, Savova G: Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing. 2008, Columbus, Ohio: Association for Computational Linguistics, 94-95. pp. 94-95"},{"key":"648_CR20","first-page":"91","volume-title":"Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature","author":"Y-C Wu","year":"2006","unstructured":"Wu Y-C, Fan T-K, Lee Y-S, Yen S-J: Extracting named entities using support vector machines. Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature. 2006, Singapore: Springer-Verlag, 91-103. pp. 91-103"},{"key":"648_CR21","doi-asserted-by":"publisher","first-page":"142","DOI":"10.3115\/1117601.1117635","volume-title":"Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7","author":"T Kudoh","year":"2000","unstructured":"Kudoh T, Matsumoto Y: Use of support vector learning for chunk identification. Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7. 2000, Lisbon, Portugal: Association for Computational Linguistics, 142-144. pp. 142-144"},{"key":"648_CR22","first-page":"1","volume-title":"Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies","author":"T Kudo","year":"2001","unstructured":"Kudo T, Matsumoto Y: Chunking with support vector machines. Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies. 2001, Pittsburgh, Pennsylvania: Association for Computational Linguistics, 1-8. pp. 1-8"},{"key":"648_CR23","first-page":"1453","volume":"6","author":"I Tsochantaridis","year":"2005","unstructured":"Tsochantaridis I, Joachims T, Hofmann T, Altun Y: Large Margin Methods for Structured and Interdependent Output Variables. J Mach Learn Res. 2005, 6: 1453-1484.","journal-title":"J Mach Learn Res"},{"key":"648_CR24","unstructured":"Turian J, Ratinov L, Bengio Y: Word representations: a simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics"},{"key":"648_CR25","first-page":"337","volume-title":"HLT-NAACL","author":"SaG Miller","year":"2004","unstructured":"Miller SaG, Jethran , Zamanian , Alex : Name Tagging with Word Clusters and Discriminative Training. HLT-NAACL. 2004, 337-342. pp. 337-342"},{"key":"648_CR26","first-page":"384","volume-title":"Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics","author":"J Turian","year":"2010","unstructured":"Turian J, Ratinov L, Bengio Y: Word representations: a simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010, Uppsala, Sweden: Association for Computational Linguistics, 384-394. pp. 384-394"},{"key":"648_CR27","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1016\/j.jbi.2011.10.007","volume":"45","author":"S Jonnalagadda","year":"2012","unstructured":"Jonnalagadda S, Cohen T, Wu S, Gonzalez G: Enhancing clinical concept extraction with distributional semantics. Journal of Biomedical Informatics. 2012, 45: 129-140. 10.1016\/j.jbi.2011.10.007.","journal-title":"Journal of Biomedical Informatics"},{"key":"648_CR28","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1145\/2390068.2390073","volume-title":"Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics","author":"B Tang","year":"2012","unstructured":"Tang B, Cao H, Wu Y, Jiang M, Xu H: Clinical entity recognition using structural support vector machines with rich features. Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics. 2012, New York: ACM, 13-20. 10.1145\/2390068.2390073."},{"key":"648_CR29","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1007\/s10994-009-5108-8","volume":"77","author":"T Joachims","year":"2009","unstructured":"Joachims T, Finley T, Yu C-NJ: Cutting-plane training of structural SVMs. Mach Learn. 2009, 77: 27-59. 10.1007\/s10994-009-5108-8.","journal-title":"Mach Learn"},{"key":"648_CR30","first-page":"293","volume-title":"AMIA Annu Symp Proc","author":"Y He","year":"2008","unstructured":"He Y, Kayaalp M: Biological entity recognition with conditional random fields. AMIA Annu Symp Proc. 2008, 293-297."},{"key":"648_CR31","doi-asserted-by":"publisher","first-page":"2794","DOI":"10.1093\/bioinformatics\/bti414","volume":"21","author":"Y Song","year":"2005","unstructured":"Song Y, Kim E, Lee GG, Yi BK: POSBIOTM-NER: a trainable biomedical named-entity recognition system. Bioinformatics. 2005, 21: 2794-2796. 10.1093\/bioinformatics\/bti414.","journal-title":"Bioinformatics"},{"key":"648_CR32","doi-asserted-by":"publisher","first-page":"173","DOI":"10.3115\/977035.977059","volume-title":"Ninth Conference of the European Chapter of the Association for Computational Linguistics","author":"EFTK Sang","year":"1999","unstructured":"Sang EFTK, Veenstra J: Representing text chunks. Ninth Conference of the European Chapter of the Association for Computational Linguistics. 1999, 173-179. pp. 173-179"},{"key":"648_CR33","first-page":"467","volume":"18","author":"PF Brown","year":"1992","unstructured":"Brown PF, deSouza PV, Mercer RL, Pietra VJD, Lai JC: Class-based n-gram models of natural language. Comput Linguist. 1992, 18: 467-479.","journal-title":"Comput Linguist"},{"key":"648_CR34","doi-asserted-by":"publisher","first-page":"203","DOI":"10.3758\/BF03204766","volume":"28","author":"K Lund","year":"1996","unstructured":"Lund K, Burgess C: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers. 1996, 28: 203-208.","journal-title":"Behavior Research Methods, Instruments, & Computers"},{"key":"648_CR35","first-page":"4","volume":"1","author":"F Wilcoxon","year":"1945","unstructured":"Wilcoxon F: Individual comparisons by ranking methods. Biometrics Bulletin. 1945, 1: 4-","journal-title":"Biometrics Bulletin"},{"key":"648_CR36","first-page":"199","volume-title":"NIPS'11","author":"PS Dhillon","year":"2011","unstructured":"Dhillon PS, Foster D, Ungar L: Multi-View Learning of Word Embeddings via CCA. NIPS'11. 2011, 199-207."}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1472-6947-13-S1-S1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T23:15:38Z","timestamp":1630538138000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/1472-6947-13-S1-S1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,4]]},"references-count":36,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2013,4]]}},"alternative-id":["648"],"URL":"https:\/\/doi.org\/10.1186\/1472-6947-13-s1-s1","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,4]]},"assertion":[{"value":"5 April 2013","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S1"}}