{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T22:33:49Z","timestamp":1767652429565,"version":"build-2065373602"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Clustering MEDLINE documents is usually conducted by the vector space model, which computes the content similarity between two documents by basically using the inner-product of their word vectors. Recently, the semantic information of MeSH (Medical Subject Headings) thesaurus is being applied to clustering MEDLINE documents by mapping documents into MeSH concept vectors to be clustered. However, current approaches of using MeSH thesaurus have two serious limitations: first, important semantic information may be lost when generating MeSH concept vectors, and second, the content information of the original text has been discarded.<\/jats:p><jats:p>Methods: Our new strategy includes three key points. First, we develop a sound method for measuring the semantic similarity between two documents over the MeSH thesaurus. Second, we combine both the semantic and content similarities to generate the integrated similarity matrix between documents. Third, we apply a spectral approach to clustering documents over the integrated similarity matrix.<\/jats:p><jats:p>Results: Using various 100 datasets of MEDLINE records, we conduct extensive experiments with changing alternative measures and parameters. Experimental results show that integrating the semantic and content similarities outperforms the case of using only one of the two similarities, being statistically significant. We further find the best parameter setting that is consistent over all experimental conditions conducted. We finally show a typical example of resultant clusters, confirming the effectiveness of our strategy in improving MEDLINE document clustering.<\/jats:p><jats:p>Contact: \u00a0zhushanfeng@gmail.com<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp338","type":"journal-article","created":{"date-parts":[[2009,6,5]],"date-time":"2009-06-05T00:24:32Z","timestamp":1244161472000},"page":"1944-1951","source":"Crossref","is-referenced-by-count":65,"title":["Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity"],"prefix":"10.1093","volume":"25","author":[{"given":"Shanfeng","family":"Zhu","sequence":"first","affiliation":[{"name":"1 Shanghai Key Lab of Intelligent Information Processing, Fudan University, 2 School of Computer Science, Fudan University, Shanghai 200433, China, 3 Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong and 4 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan"},{"name":"1 Shanghai Key Lab of Intelligent Information Processing, Fudan University, 2 School of Computer Science, Fudan University, Shanghai 200433, China, 3 Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong and 4 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan"}]},{"given":"Jia","family":"Zeng","sequence":"additional","affiliation":[{"name":"1 Shanghai Key Lab of Intelligent Information Processing, Fudan University, 2 School of Computer Science, Fudan University, Shanghai 200433, China, 3 Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong and 4 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan"}]},{"given":"Hiroshi","family":"Mamitsuka","sequence":"additional","affiliation":[{"name":"1 Shanghai Key Lab of Intelligent Information Processing, Fudan University, 2 School of Computer Science, Fudan University, Shanghai 200433, China, 3 Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong and 4 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan"}]}],"member":"286","published-online":{"date-parts":[[2009,6,3]]},"reference":[{"volume-title":"Modern Information Retrieval.","year":"1999","author":"Baeza-Yates","key":"2023013112050123300_B1"},{"key":"2023013112050123300_B2","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1162\/coli.2006.32.1.13","article-title":"Evaluating WordNet-based measures of lexical semantic relatedness","volume":"32","author":"Budanitsky","year":"2006","journal-title":"Comput. Linguist."},{"key":"2023013112050123300_B3","article-title":"Scalable clustering methods for data mining","volume-title":"Handbook of Data Mining.","author":"Ghosh","year":"2003"},{"key":"2023013112050123300_B4","article-title":"WordNet improves text document clustering","volume-title":"Proceedings of the Semantic Web Workshop at SIGIR, Toronto, Canada.","author":"Hotho","year":"2003"},{"key":"2023013112050123300_B5","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1038\/nrg1768","article-title":"Literature mining for the biologist: from information retrieval to biological discovery","volume":"7","author":"Jensen","year":"2006","journal-title":"Nat. Rev. Genet."},{"key":"2023013112050123300_B6","article-title":"Semantic similarity based on corpus statistics and lexical taxonomy","volume-title":"Proceedings of the International Conference on Research in Computational Linguistics, (ROCLING), Taipei, Taiwan.","author":"Jiang","year":"1997"},{"key":"2023013112050123300_B7","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1145\/990308.990313","article-title":"On clustrings: good, bad and spectral","volume":"51","author":"Kannan","year":"2004","journal-title":"J. ACM"},{"key":"2023013112050123300_B8","doi-asserted-by":"crossref","first-page":"265","DOI":"10.7551\/mitpress\/7287.003.0018","article-title":"Combining local context and WordNet similarity for word sense identification","volume-title":"WordNet: An Electronic Lexical Database","author":"Leacock","year":"1998"},{"key":"2023013112050123300_B9","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1186\/1471-2105-7-140","article-title":"Exploring supervised and unsupervsied methods to detect topics in biomedical text","volume":"7","author":"Lee","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023013112050123300_B10","first-page":"296","article-title":"An information-theoretic definition of similarity","volume-title":"Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconson, USA","author":"Lin","year":"1998"},{"key":"2023013112050123300_B11","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1186\/1471-2105-8-423","article-title":"Pubmed related articles: a probabilistic topic-based model for content similarity","volume":"8","author":"Lin","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013112050123300_B12","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","article-title":"Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation","volume":"19","author":"Lord","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013112050123300_B13","article-title":"A random walks view of spectral segmentation","volume-title":"Proceedings of International Conference on AI and Statistics (AISTAT), Key West, FL.","author":"Meila","year":"2001"},{"key":"2023013112050123300_B14","first-page":"427","article-title":"Relevance score normalization for metasearch","volume-title":"Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, Georgia","author":"Montague","year":"2001"},{"key":"2023013112050123300_B15","first-page":"589","article-title":"Besides precision & recall: exploring alternative approaches to evaluating an automatic indexing tool for medline","volume-title":"AMIA Annual Symposium Proceedings, Washington, DC. USA","author":"N\u00e9v\u00e9ol","year":"2006"},{"key":"2023013112050123300_B16","first-page":"849","article-title":"On spectral clustering: analysis and an algorithm","volume-title":"Proceedings of the Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada","author":"Ng","year":"2001"},{"key":"2023013112050123300_B17","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1016\/j.jbi.2006.06.004","article-title":"Measures of semantic similarity and relatedness in the biomedical domain","volume":"40","author":"Pedersen","year":"2007","journal-title":"J. Biomed. Inform."},{"volume-title":"Introduction to Modern Information Retrieval","year":"1983","author":"Salton","key":"2023013112050123300_B18"},{"key":"2023013112050123300_B19","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized cuts and image segmentation","volume":"22","author":"Shi","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023013112050123300_B20","article-title":"Comparison of spectral clustering methods","volume-title":"UW CSE Technical report 03-05-01.","author":"Verma","year":"2003"},{"journal-title":"Spectral clustering toolbox.","year":"2003","author":"Verma","key":"2023013112050123300_B21"},{"key":"2023013112050123300_B22","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/bioinformatics\/btm087","article-title":"A new method to measure the semantic similarity of GO terms","volume":"23","author":"Wang","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013112050123300_B23","doi-asserted-by":"crossref","first-page":"D13","DOI":"10.1093\/nar\/gkm1000","article-title":"Database resources of the National Center for Biotechnology Information","volume":"36","author":"Wheeler","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013112050123300_B24","first-page":"133","article-title":"Verb semantics and lexical selection","volume-title":"Meeting of the Association for Computational Linguistics (ACL), Las Cruces, New Mexico, USA","author":"Wu","year":"1994"},{"key":"2023013112050123300_B25","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1145\/1150402.1150505","article-title":"Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering","volume-title":"Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA","author":"Yoo","year":"2006"},{"key":"2023013112050123300_B26","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1504\/IJBRA.2007.015010","article-title":"Biomedical ontology improves biomedical literature clustering performance: a comparison study","volume":"3","author":"Yoo","year":"2007","journal-title":"Int. J. Bioinform. Res. Appl."},{"key":"2023013112050123300_B27","first-page":"115","article-title":"A comparative study of ontology based term similarity measures on PubMed document clustering","volume-title":"Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand","author":"Zhang","year":"2007"},{"key":"2023013112050123300_B28","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1007\/s10115-004-0194-1","article-title":"Generative model-based document clustering: a comparative study","volume":"8","author":"Zhong","year":"2005","journal-title":"Knowl. Inf. Syst."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/15\/1944\/48993143\/bioinformatics_25_15_1944.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/15\/1944\/48993143\/bioinformatics_25_15_1944.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,9]],"date-time":"2025-02-09T22:37:05Z","timestamp":1739140625000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/15\/1944\/212456"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,6,3]]},"references-count":28,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2009,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp338","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2009,8,1]]},"published":{"date-parts":[[2009,6,3]]}}}