{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T11:25:50Z","timestamp":1773055550955,"version":"3.50.1"},"reference-count":18,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>In recent years, Data Mining technology has been applied more than ever before in the field of traditional Chinese medicine (TCM) to discover regularities from the experience accumulated in the past thousands of years in China. Electronic medical records (or clinical records) of TCM, containing larger amount of information than well-structured data of prescriptions extracted manually from TCM literature such as information related to medical treatment process, could be an important source for discovering valuable regularities of TCM. However, they are collected by TCM doctors on a day to day basis without the support of authoritative editorial board, and owing to different experience and background of TCM doctors, the same concept might be described in several different terms. Therefore, clinical records of TCM cannot be used directly to Data Mining and Knowledge Discovery. This paper focuses its attention on the phenomena of \"one symptom with different names\" and investigates a series of metrics for automatically normalizing symptom names in clinical records of TCM.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>A series of extensive experiments were performed to validate the metrics proposed, and they have shown that the hybrid similarity metrics integrating literal similarity and remedy-based similarity are more accurate than the others which are based on literal similarity or remedy-based similarity alone, and the highest F-Measure (65.62%) of all the metrics is achieved by hybrid similarity metric VSM+TFIDF+SWD.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Automatic symptom name normalization is an essential task for discovering knowledge from clinical data of TCM. The problem is introduced for the first time by this paper. The results have verified that the investigated metrics are reasonable and accurate, and the hybrid similarity metrics are much better than the metrics based on literal similarity or remedy-based similarity alone.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-11-40","type":"journal-article","created":{"date-parts":[[2010,1,20]],"date-time":"2010-01-20T19:28:55Z","timestamp":1264015735000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Automatic symptom name normalization in clinical records of traditional Chinese medicine"],"prefix":"10.1186","volume":"11","author":[{"given":"Yaqiang","family":"Wang","sequence":"first","affiliation":[]},{"given":"Zhonghua","family":"Yu","sequence":"additional","affiliation":[]},{"given":"Yongguang","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Kaikuo","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Xia","family":"Chen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,1,20]]},"reference":[{"key":"3497_CR1","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1016\/j.artmed.2006.07.005","volume":"38","author":"F Yi","year":"2006","unstructured":"Yi F, Zhaohui W, Xuezhong Z, Zhongmei Z, Weiyu F: Knowledge discovery in Traditional Chinese Medicine: State of the art and perspectives. Artif Intell Med 2006, 38: 219\u2013236. 10.1016\/j.artmed.2006.07.005","journal-title":"Artif Intell Med"},{"key":"3497_CR2","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1109\/BMEI.2008.244","volume-title":"Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics","author":"C Li","year":"2008","unstructured":"Li C, Tang C, Zeng C, Wu J, Chen Y, Qiu J, Zhu J, Dai L, Jiang Y: Discovering Multi-dimensional Major Medicines from Traditional Chinese Medicine Prescriptions. Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics 2008, 260\u2013264. full_text"},{"key":"3497_CR3","first-page":"251","volume-title":"Proceedings of ninth Pacific Rim International Conference on Artificial Intelligence","author":"L Chuan","year":"2006","unstructured":"Chuan L, Changjie T, Zhonghua Y, Yintian L, Tianqing Z, Qihong L, Mingfang Z, Yongguang J: Mining Multi-dimensional Frequent Patterns Without Data Cube Construction. Proceedings of ninth Pacific Rim International Conference on Artificial Intelligence 2006, 251\u2013260."},{"key":"3497_CR4","first-page":"73","volume-title":"Proceedings of the IJCAL-2003 Workshop on Information Integration on the Web","author":"WC William","year":"2003","unstructured":"William WC, Pradeep R, Stephen EF: A Comparison of String Distance Metrics for Name-Matching Tasks. Proceedings of the IJCAL-2003 Workshop on Information Integration on the Web 2003, 73\u201378."},{"key":"3497_CR5","unstructured":"An Introduction To Jaro-Winkler Distance[http:\/\/en.wikipedia.org\/wiki\/Jaro-Winkler_distance]"},{"key":"3497_CR6","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","volume":"147","author":"TF Smith","year":"1981","unstructured":"Smith TF, Waterman MS: Identification of Common Molecular Subsequences. J Mol Biol 1981, 147: 195\u2013197. 10.1016\/0022-2836(81)90087-5","journal-title":"J Mol Biol"},{"key":"3497_CR7","doi-asserted-by":"publisher","first-page":"705","DOI":"10.1016\/0022-2836(82)90398-9","volume":"162","author":"O Gotoh","year":"1982","unstructured":"Gotoh O: An Improved Algorithm for Matching Biological Sequences. J Mol Biol 1982, 162: 705\u2013708. 10.1016\/0022-2836(82)90398-9","journal-title":"J Mol Biol"},{"key":"3497_CR8","first-page":"538","volume-title":"Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"J Glen","year":"2002","unstructured":"Glen J, Jennifer W: SimRank: A Measure of Structural-Context Similarity. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002, 538\u2013543."},{"key":"3497_CR9","first-page":"491","volume-title":"Proceedings of the Twelfth European Conference on Machine Learning","author":"DT Peter","year":"2001","unstructured":"Peter DT: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning 2001, 491\u2013502."},{"key":"3497_CR10","unstructured":"Resources for Algorithms in Biology[http:\/\/ai.stanford.edu\/~serafim\/CS374_2008\/]"},{"key":"3497_CR11","volume-title":"Technical Report","author":"EW William","year":"2006","unstructured":"William EW: Overview of Record Linkage and Current Research Directions. In Technical Report. Statistical Research Division, U.S. Bureau of the Census, Washington, DC; 2006."},{"key":"3497_CR12","volume-title":"Technical Report","author":"EW William","year":"1999","unstructured":"William EW: The State of Record Linkage and Current Research Problems. In Technical Report. Statistical Research Division, U.S. Bureau of the Census, Washington, DC; 1999."},{"key":"3497_CR13","first-page":"355","volume-title":"Business Survey methods","author":"EW William","year":"1995","unstructured":"William EW: Matching and Record Linkage. In Business Survey methods. New York: J. Wiley; 1995:355\u2013384."},{"key":"3497_CR14","volume-title":"Proceedings of the Linking NSF Scientist and Engineering Data to Scientific Productivity Data Workshop","author":"EW William","year":"2008","unstructured":"William EW: Overview of Record Linkage for Name Matching. Proceedings of the Linking NSF Scientist and Engineering Data to Scientific Productivity Data Workshop 2008. [http:\/\/www.albany.edu\/~marschke\/Workshop\/WinklerNSFOverview080212.pdf]"},{"key":"3497_CR15","volume-title":"Proceedings of the Australasian Data Mining Workshop","author":"C Peter","year":"2002","unstructured":"Peter C, Tim C, Justin XZ: Probabilistic Name and Address Cleaning and Standardization. Proceedings of the Australasian Data Mining Workshop 2002. [http:\/\/datamining.anu.edu.au\/publications\/2002\/adm2002-cleaning.pdf]"},{"key":"3497_CR16","doi-asserted-by":"publisher","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","volume":"19","author":"PW Lord","year":"2003","unstructured":"Lord PW, Stevens RD, Brass A, C Goble A: Investigating Semantic Similarity Measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 2003, 19: 1275\u20131283. 10.1093\/bioinformatics\/btg153","journal-title":"Bioinformatics"},{"key":"3497_CR17","volume-title":"Proceedings of Web Service Semantics Workshop at WWW","author":"H Jeffrey","year":"2005","unstructured":"Jeffrey H, William L, John D: A Semantic Similarity Measure for Semantic Web Services. Proceedings of Web Service Semantics Workshop at WWW 2005. [http:\/\/www.ai.sri.com\/WSS2005\/final-versions\/WSS2005-Hau-Final.pdf]"},{"key":"3497_CR18","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1613\/jair.514","volume":"11","author":"R Philip","year":"1999","unstructured":"Philip R: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. J Artif Intell Res 1999, 11: 95\u2013130.","journal-title":"J Artif Intell Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-40.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T12:07:50Z","timestamp":1630498070000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-40"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,20]]},"references-count":18,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3497"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-40","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,1,20]]},"assertion":[{"value":"19 May 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"40"}}