{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T10:29:41Z","timestamp":1758709781603},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2007,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and\/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The string-based approach achieved an error rate of 0.143, with a mean reciprocal rank of 0.907. The context-based and string-based approaches were found to be complementary, and the error rate was reduced further by applying a linear combination of the two classifiers. The advantage of combining the two approaches was especially manifested on test data with sufficient contextual features, achieving the lowest error rate of 0.055 and a mean reciprocal rank of 0.969.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>The lexical features provide another semantic dimension in addition to syntactic contextual features that support the classification of ontological concepts. The classification errors of each dimension can be further reduced through appropriate combination of the complementary classifiers.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-8-264","type":"journal-article","created":{"date-parts":[[2007,7,25]],"date-time":"2007-07-25T06:13:26Z","timestamp":1185344006000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Using contextual and lexical features to restructure and validate the classification of biomedical concepts"],"prefix":"10.1186","volume":"8","author":[{"given":"Jung-Wei","family":"Fan","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hua","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carol","family":"Friedman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2007,7,24]]},"reference":[{"issue":"1","key":"1636_CR1","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"M Ashburner","year":"2000","unstructured":"Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038\/75556.","journal-title":"Nat Genet"},{"issue":"6","key":"1636_CR2","doi-asserted-by":"publisher","first-page":"478","DOI":"10.1016\/j.jbi.2003.11.007","volume":"36","author":"C Rosse","year":"2003","unstructured":"Rosse C, Mejino JL: A reference ontology for biomedical informatics: the Foundational Model of Anatomy. J Biomed Inform. 2003, 36 (6): 478-500. 10.1016\/j.jbi.2003.11.007.","journal-title":"J Biomed Inform"},{"issue":"4","key":"1636_CR3","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1055\/s-0038-1634945","volume":"32","author":"DA Lindberg","year":"1993","unstructured":"Lindberg DA, Humphreys BL, McCray AT: The Unified Medical Language System. Methods Inf Med. 1993, 32 (4): 281-291.","journal-title":"Methods Inf Med"},{"issue":"1","key":"1636_CR4","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1136\/jamia.1998.0050012","volume":"5","author":"KE Campbell","year":"1998","unstructured":"Campbell KE, Oliver DE, Shortliffe EH: The Unified Medical Language System: toward a collaborative approach for solving terminologic problems. J Am Med Inform Assoc. 1998, 5 (1): 12-16.","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"1636_CR5","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1016\/j.jbi.2005.11.006","volume":"39","author":"AC Yu","year":"2006","unstructured":"Yu AC: Methods in biomedical ontology. J Biomed Inform. 2006, 39 (3): 252-266. 10.1016\/j.jbi.2005.11.006.","journal-title":"J Biomed Inform"},{"issue":"1","key":"1636_CR6","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1093\/bib\/6.1.57","volume":"6","author":"AM Cohen","year":"2005","unstructured":"Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Brief Bioinform. 2005, 6 (1): 57-71. 10.1093\/bib\/6.1.57.","journal-title":"Brief Bioinform"},{"issue":"1\u20132","key":"1636_CR7","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1080\/07388550590935571","volume":"25","author":"J Natarajan","year":"2005","unstructured":"Natarajan J, Berrar D, Hack CJ, Dubitzky W: Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications. Crit Rev Biotechnol. 2005, 25 (1\u20132): 31-52. 10.1080\/07388550590935571.","journal-title":"Crit Rev Biotechnol"},{"issue":"2","key":"1636_CR8","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1038\/nrg1768","volume":"7","author":"LJ Jensen","year":"2006","unstructured":"Jensen LJ, Saric J, Bork P: Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet. 2006, 7 (2): 119-129. 10.1038\/nrg1768.","journal-title":"Nat Rev Genet"},{"issue":"3","key":"1636_CR9","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1093\/bib\/6.3.239","volume":"6","author":"I Spasic","year":"2005","unstructured":"Spasic I, Ananiadou S, McNaught J, Kumar A: Text mining and ontologies in biomedicine: making sense of raw text. Brief Bioinform. 2005, 6 (3): 239-251. 10.1093\/bib\/6.3.239.","journal-title":"Brief Bioinform"},{"key":"1636_CR10","doi-asserted-by":"crossref","unstructured":"Bodenreider O: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004, D267-270. 10.1093\/nar\/gkh061. 32 Database","DOI":"10.1093\/nar\/gkh061"},{"key":"1636_CR11","unstructured":"The NCBI Entrez Taxonomy. [http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?db=Taxonomy]"},{"key":"1636_CR12","unstructured":"Online Mendelian Inheritance in Man, OMIM (TM). [http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?db=OMIM]"},{"issue":"4","key":"1636_CR13","doi-asserted-by":"publisher","first-page":"467","DOI":"10.1197\/jamia.M2314","volume":"14","author":"JW Fan","year":"2007","unstructured":"Fan JW, Friedman C: Semantic classification of biomedical concepts using distributional similarity. J Am Med Inform Assoc. 2007, 14 (4): 467-77.","journal-title":"J Am Med Inform Assoc"},{"key":"1636_CR14","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198242246.001.0001","volume-title":"A theory of language and information: a mathematical approach","author":"ZS Harris","year":"1991","unstructured":"Harris ZS: A theory of language and information: a mathematical approach. 1991, Oxford [England], New York: Clarendon Press; Oxford University Press"},{"key":"1636_CR15","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1002\/cfg.255","volume":"4","author":"AT McCray","year":"2003","unstructured":"McCray AT: An upper level ontology for the biomedical domain. Comp Funct Genom. 2003, 4: 80-84. 10.1002\/cfg.255.","journal-title":"Comp Funct Genom"},{"key":"1636_CR16","first-page":"554","volume-title":"AMIA Annu Symp Proc","author":"TC Rindflesch","year":"2003","unstructured":"Rindflesch TC, Libbus B, Hristovski D, Aronson AR, Kilicoglu H: Semantic relations asserting the etiology of genetic diseases. AMIA Annu Symp Proc. 2003, 554-558."},{"issue":"6","key":"1636_CR17","doi-asserted-by":"publisher","first-page":"462","DOI":"10.1016\/j.jbi.2003.11.003","volume":"36","author":"TC Rindflesch","year":"2003","unstructured":"Rindflesch TC, Fiszman M: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2003, 36 (6): 462-477. 10.1016\/j.jbi.2003.11.003.","journal-title":"J Biomed Inform"},{"issue":"6","key":"1636_CR18","doi-asserted-by":"publisher","first-page":"450","DOI":"10.1016\/j.jbi.2003.11.001","volume":"36","author":"JJ Cimino","year":"2003","unstructured":"Cimino JJ, Min H, Perl Y: Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus. J Biomed Inform. 2003, 36 (6): 450-461. 10.1016\/j.jbi.2003.11.001.","journal-title":"J Biomed Inform"},{"key":"1636_CR19","first-page":"300","volume-title":"IEEE Proc Syst Mans Cybern","author":"A Burgun","year":"1999","unstructured":"Burgun A, Botti G, Fieschi M, Le Beux P: Sharing knowledge in medicine: semantic and ontologic facets of medical concepts. IEEE Proc Syst Mans Cybern. 1999, 300-305."},{"key":"1636_CR20","first-page":"216","volume-title":"Medinfo","author":"AT McCray","year":"2001","unstructured":"McCray AT, Burgun A, Bodenreider O: Aggregating UMLS semantic types for reducing conceptual complexity. Medinfo. 2001, 216-220."},{"issue":"2","key":"1636_CR21","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1109\/TITB.2002.1006296","volume":"6","author":"Z Chen","year":"2002","unstructured":"Chen Z, Perl Y, Halper M, Geller J, Gu H: Partitioning the UMLS semantic network. IEEE Trans Inf Technol Biomed. 2002, 6 (2): 102-108. 10.1109\/TITB.2002.1006296.","journal-title":"IEEE Trans Inf Technol Biomed"},{"issue":"1","key":"1636_CR22","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1016\/j.artmed.2004.06.002","volume":"33","author":"L Zhang","year":"2005","unstructured":"Zhang L, Perl Y, Halper M, Geller J, Hripcsak G: A lexical metaschema for the UMLS semantic network. Artif Intell Med. 2005, 33 (1): 41-59. 10.1016\/j.artmed.2004.06.002.","journal-title":"Artif Intell Med"},{"issue":"1","key":"1636_CR23","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/j.artmed.2004.02.002","volume":"31","author":"H Gu","year":"2004","unstructured":"Gu H, Perl Y, Elhanan G, Min H, Zhang L, Peng Y: Auditing concept categorizations in the UMLS. Artif Intell Med. 2004, 31 (1): 29-44. 10.1016\/j.artmed.2004.02.002.","journal-title":"Artif Intell Med"},{"issue":"1","key":"1636_CR24","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1136\/jamia.1998.0050041","volume":"5","author":"JJ Cimino","year":"1998","unstructured":"Cimino JJ: Auditing the Unified Medical Language System with semantic methods. J Am Med Inform Assoc. 1998, 5 (1): 41-51.","journal-title":"J Am Med Inform Assoc"},{"key":"1636_CR25","first-page":"612","volume-title":"Proc AMIA Symp","author":"Y Peng","year":"2002","unstructured":"Peng Y, Halper MH, Perl Y, Geller J: Auditing the UMLS for redundant classifications. Proc AMIA Symp. 2002, 612-616."},{"key":"1636_CR26","volume-title":"Mathematical structures of language","author":"ZS Harris","year":"1968","unstructured":"Harris ZS: Mathematical structures of language. 1968, New York: Interscience Publishers"},{"key":"1636_CR27","first-page":"166","volume-title":"Proc Intl Conf Recent Adv Nat Lang Process","author":"P Cimiano","year":"2005","unstructured":"Cimiano P, V\u00f6lker J: Towards large-scale, open-domain and ontology-based named entity classification. Proc Intl Conf Recent Adv Nat Lang Process. 2005, 166-172."},{"key":"1636_CR28","first-page":"67","volume-title":"The balancing act: combining symbolic and statistical approaches to language","author":"V Hatzivassiloglou","year":"1996","unstructured":"Hatzivassiloglou V: Do we need linguistics when we have statistics? A comparative analysis of the contributions of linguistic cues to a statistical word grouping system. The balancing act: combining symbolic and statistical approaches to language. Edited by: Klavans JL, Resnik P. 1996, Cambridge (MA): MIT Press, 67-94."},{"key":"1636_CR29","doi-asserted-by":"crossref","first-page":"25","DOI":"10.3115\/1034678.1034693","volume-title":"Proc Annu Meet Assoc Comput Linguist","author":"L Lee","year":"1999","unstructured":"Lee L: Measures of distributional similarity. Proc Annu Meet Assoc Comput Linguist. 1999, 25-32."},{"key":"1636_CR30","first-page":"714","volume-title":"AMIA Annu Symp Proc","author":"T Sibanda","year":"2006","unstructured":"Sibanda T, He T, Szolovits P, Uzuner O: Syntactically-informed semantic category recognition in discharge summaries. AMIA Annu Symp Proc. 2006, 714-718."},{"issue":"1","key":"1636_CR31","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1075\/term.11.1.06wee","volume":"11","author":"J Weeds","year":"2005","unstructured":"Weeds J, Dowdall J, Schneider G, Keller B, Weir D: Using distributional similarity to organize biomedical terminology. Terminology. 2005, 11 (1): 107-141.","journal-title":"Terminology"},{"issue":"Suppl 1","key":"1636_CR32","doi-asserted-by":"publisher","first-page":"i180","DOI":"10.1093\/bioinformatics\/btg1023","volume":"19","author":"JD Kim","year":"2003","unstructured":"Kim JD, Ohta T, Tateisi Y, Tsujii J: GENIA corpus \u2013 semantically annotated corpus for bio-textmining. Bioinformatics. 2003, 19 (Suppl 1): i180-182. 10.1093\/bioinformatics\/btg1023.","journal-title":"Bioinformatics"},{"issue":"2","key":"1636_CR33","first-page":"282","volume":"5","author":"RA Calvo","year":"2004","unstructured":"Calvo RA, Lee J, Li X: Managing content with automatic document classification. J Digit Inf. 2004, 5 (2): 282-","journal-title":"J Digit Inf"},{"issue":"1\u20132","key":"1636_CR34","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1023\/A:1023824908771","volume":"19","author":"J Diederich","year":"2003","unstructured":"Diederich J, Kindermann O, Leopold E, Paass G: Authorship attribution with support vector machines. APPL INTELL. 2003, 19 (1\u20132): 109-123. 10.1023\/A:1023824908771.","journal-title":"APPL INTELL"},{"key":"1636_CR35","first-page":"109","volume-title":"Text mining and its applications","author":"F Sebastiani","year":"2005","unstructured":"Sebastiani F: Text categorization. Text mining and its applications. Edited by: Zanasi A. 2005, Southampton, UK: WIT Press, 109-129."},{"issue":"6","key":"1636_CR36","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1016\/j.jbi.2004.08.012","volume":"37","author":"KJ Lee","year":"2004","unstructured":"Lee KJ, Hwang YS, Kim S, Rim HC: Biomedical named entity recognition using two-phase model based on SVMs. J Biomed Inform. 2004, 37 (6): 436-447. 10.1016\/j.jbi.2004.08.012.","journal-title":"J Biomed Inform"},{"issue":"6","key":"1636_CR37","doi-asserted-by":"publisher","first-page":"498","DOI":"10.1016\/j.jbi.2004.08.007","volume":"37","author":"M Torii","year":"2004","unstructured":"Torii M, Kamboj S, Vijay-Shanker K: Using name-internal and contextual features to classify biological terms. J Biomed Inform. 2004, 37 (6): 498-511. 10.1016\/j.jbi.2004.08.007.","journal-title":"J Biomed Inform"},{"key":"1636_CR38","volume-title":"Machine Learning","author":"TM Mitchell","year":"1997","unstructured":"Mitchell TM: Machine Learning. 1997, New York: McGraw-Hill"},{"key":"1636_CR39","doi-asserted-by":"crossref","unstructured":"Wain HM, Lush MJ, Ducluzeau F, Khodiyar VK, Povey S: Genew: the Human Gene Nomenclature Database, 2004 updates. Nucleic Acids Res. 2004, D255-257. 10.1093\/nar\/gkh072. 32 Database","DOI":"10.1093\/nar\/gkh072"},{"key":"1636_CR40","doi-asserted-by":"publisher","first-page":"14","DOI":"10.3115\/1118149.1118152","volume-title":"Proceedings of Workshop on NLP in the Biomedical Domain, ACL","author":"KB Cohen","year":"2002","unstructured":"Cohen KB, Acquaah-Mensah GK, Dolbey AE, Hunter L: Contrast and variability in gene names. Proceedings of Workshop on NLP in the Biomedical Domain, ACL. 2002, 14-20. ; Philadelphia"},{"key":"1636_CR41","unstructured":"SNOMED: SNOMED CT. [http:\/\/www.ihtsdo.org\/our-standards\/snomed-ct\/]"},{"key":"1636_CR42","unstructured":"TREC Genomics Track \u2013 Roadmap. [http:\/\/ir.ohsu.edu\/genomics\/roadmap.html]"},{"key":"1636_CR43","first-page":"214","volume-title":"Pac Symp Biocomput","author":"PV Ogren","year":"2004","unstructured":"Ogren PV, Cohen KB, Acquaah-Mensah GK, Eberlein J, Hunter L: The compositional structure of Gene Ontology terms. Pac Symp Biocomput. 2004, 214-225."},{"issue":"5","key":"1636_CR44","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1136\/jamia.1998.0050421","volume":"5","author":"KE Campbell","year":"1998","unstructured":"Campbell KE, Oliver DE, Spackman KA, Shortliffe EH: Representing thoughts, words, and things in the UMLS. J Am Med Inform Assoc. 1998, 5 (5): 421-431.","journal-title":"J Am Med Inform Assoc"},{"key":"1636_CR45","unstructured":"PubMed stopwords. [http:\/\/www.ncbi.nlm.nih.gov\/books\/bv.fcgi?rid=helppubmed.table.pubmedhelp.T43]"},{"key":"1636_CR46","first-page":"448","volume-title":"Proc AMIA Symp","author":"AT McCray","year":"2001","unstructured":"McCray AT, Bodenreider O, Malley JD, Browne AC: Evaluating UMLS strings for natural language processing. Proc AMIA Symp. 2001, 448-452."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-8-264.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,17]],"date-time":"2024-02-17T01:29:09Z","timestamp":1708133349000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-8-264"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,7,24]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,12]]}},"alternative-id":["1636"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-8-264","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,7,24]]},"assertion":[{"value":"6 March 2007","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2007","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2007","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"264"}}