{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T06:30:09Z","timestamp":1760596209481,"version":"3.41.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2009,8,1]],"date-time":"2009-08-01T00:00:00Z","timestamp":1249084800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>Web search clustering is a solution to reorganize search results (also called \u201csnippets\u201d) in a more convenient way for browsing. There are three key requirements for such post-retrieval clustering systems: (1) the clustering algorithm should group similar documents together; (2) clusters should be labeled with descriptive phrases; and (3) the clustering system should provide high-quality clustering without downloading the whole Web page.<\/jats:p>\n          <jats:p>This article introduces a novel framework for clustering Web search results in Vietnamese which targets the three above issues. The main motivation is that by enriching short snippets with hidden topics from huge resources of documents on the Internet, it is able to cluster and label such snippets effectively in a topic-oriented manner without concerning whole Web pages. Our approach is based on recent successful topic analysis models, such as Probabilistic-Latent Semantic Analysis, or Latent Dirichlet Allocation. The underlying idea of the framework is that we collect a very large external data collection called \u201cuniversal dataset,\u201d and then build a clustering system on both the original snippets and a rich set of hidden topics discovered from the universal data collection. This can be seen as a richer representation of snippets to be clustered. We carry out careful evaluation of our method and show that our method can yield impressive clustering quality.<\/jats:p>","DOI":"10.1145\/1568292.1568295","type":"journal-article","created":{"date-parts":[[2009,9,1]],"date-time":"2009-09-01T17:52:59Z","timestamp":1251827579000},"page":"1-40","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Web Search Clustering and Labeling with Hidden Topics"],"prefix":"10.1145","volume":"8","author":[{"given":"Cam-Tu","family":"Nguyen","sequence":"first","affiliation":[{"name":"Tohoku University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuan-Hieu","family":"Phan","sequence":"additional","affiliation":[{"name":"Tohoku University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Susumu","family":"Horiguchi","sequence":"additional","affiliation":[{"name":"Tohoku University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thu-Trang","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Vietnam National University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Quang-Thuy","family":"Ha","sequence":"additional","affiliation":[{"name":"Vietnam National University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1020281327116"},{"key":"e_1_2_1_2_1","unstructured":"Baamboo. 2008. Vietnamese search engine. http:\/\/mp3.baamboo.coms.  Baamboo. 2008. Vietnamese search engine. http:\/\/mp3.baamboo.coms."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/980845.980859"},{"volume-title":"Proceedings of the 4th International Conference on Intelligent Text Processing and Computational Linguistics. 370--381","author":"Banerjee S.","key":"e_1_2_1_4_1","unstructured":"Banerjee , S. and Pedersen , T . 2003. The design, implementation and use of the ngram statistics . In Proceedings of the 4th International Conference on Intelligent Text Processing and Computational Linguistics. 370--381 . Banerjee, S. and Pedersen, T. 2003. The design, implementation and use of the ngram statistics. In Proceedings of the 4th International Conference on Intelligent Text Processing and Computational Linguistics. 370--381."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277909"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143859"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1214\/07-AOAS114"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242675"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860470"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/332040.332418"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/133160.133214"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1062745.1062760"},{"volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI\u201907)","author":"Garilovich E.","key":"e_1_2_1_15_1","unstructured":"Garilovich , E. and Markovitch , S . 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis . In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI\u201907) . Garilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI\u201907)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/11880561_3"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0307752101"},{"volume-title":"Parameter estimation for text analysis. Tech. rep","author":"Heinrich G.","key":"e_1_2_1_18_1","unstructured":"Heinrich , G. 2005. Parameter estimation for text analysis. Tech. rep ., University of Leipzig and vsonix GmbH . Heinrich, G. 2005. Parameter estimation for text analysis. Tech. rep., University of Leipzig and vsonix GmbH."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the Conference on Uncertainly in Artificial Intelligence (UAI\u201999)","author":"Hofmann T.","year":"1999","unstructured":"Hofmann , T. 1999 . Probabilistic lsa . In Proceedings of the Conference on Uncertainly in Artificial Intelligence (UAI\u201999) . Hofmann, T. 1999. Probabilistic lsa. In Proceedings of the Conference on Uncertainly in Artificial Intelligence (UAI\u201999)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390367"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/281250.281253"},{"key":"e_1_2_1_22_1","first-page":"73","article-title":"Recent advances in clustering: A brief survey","volume":"1","author":"Kotsiantis S.","year":"2004","unstructured":"Kotsiantis , S. and Pintelas , P. E. 2004 . Recent advances in clustering: A brief survey . WSEAS Trans. Inform. Sci. Appl. 1 , 1, 73 -- 81 . Kotsiantis, S. and Pintelas, P. E. 2004. Recent advances in clustering: A brief survey. WSEAS Trans. Inform. Sci. Appl. 1, 1, 73--81.","journal-title":"WSEAS Trans. Inform. Sci. Appl."},{"key":"e_1_2_1_23_1","unstructured":"Manning C. D. and Schutze H. 1999. Foundations of Statistic Natural Language Processing. MIT Press.   Manning C. D. and Schutze H. 1999. Foundations of Statistic Natural Language Processing . MIT Press."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281246"},{"volume-title":"A tolerance rough set approach to clustering Web search results. Master\u2019s thesis","author":"Ngo C.-L.","key":"e_1_2_1_25_1","unstructured":"Ngo , C.-L. 2003. A tolerance rough set approach to clustering Web search results. Master\u2019s thesis , Warsaw University . Ngo, C.-L. 2003. A tolerance rough set approach to clustering Web search results. Master\u2019s thesis, Warsaw University."},{"volume-title":"Proceedings of the 20th Pacific Asia Conference on Language, Information and Compuation (PACLIC\u201906)","author":"Nguyen C.-T.","key":"e_1_2_1_26_1","unstructured":"Nguyen , C.-T. , Nguyen , T.-K. , Phan , X. H. , Nguyen , L. M. , and Ha , Q. T . 2006. Vietnamese word segmentation with CRFs and SVMs: An investigation . In Proceedings of the 20th Pacific Asia Conference on Language, Information and Compuation (PACLIC\u201906) . 215--222. Nguyen, C.-T., Nguyen, T.-K., Phan, X. H., Nguyen, L. M., and Ha, Q. T. 2006. Vietnamese word segmentation with CRFs and SVMs: An investigation. In Proceedings of the 20th Pacific Asia Conference on Language, Information and Compuation (PACLIC\u201906). 215--222."},{"volume-title":"An algorithm for clustering Web search result. Master\u2019s thesis","author":"Osinski S.","key":"e_1_2_1_27_1","unstructured":"Osinski , S. 2003. An algorithm for clustering Web search result. Master\u2019s thesis . Poznan University of Technology , Poland . Osinski, S. 2003. An algorithm for clustering Web search result. Master\u2019s thesis. Poznan University of Technology, Poland."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367510"},{"key":"e_1_2_1_29_1","unstructured":"Popescul A. and Ungar L. 2000. Automatic labeling of document clusters. http:\/\/www.cis.upenn.edu\/~popescul\/Publications\/popesculcolabeling.pdf.  Popescul A. and Ungar L. 2000. Automatic labeling of document clusters. http:\/\/www.cis.upenn.edu\/~popescul\/Publications\/popesculcolabeling.pdf."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135834"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/WI.2006.92"},{"key":"e_1_2_1_32_1","unstructured":"Socbay. 2008. Vietnamese search engine. http:\/\/www.socbay.com.  Socbay. 2008. Vietnamese search engine. http:\/\/www.socbay.com."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1146598.1146650"},{"key":"e_1_2_1_34_1","unstructured":"Vivisimo. 2008. Clustering engine. http:\/\/vivisimo.com\/.  Vivisimo. 2008. Clustering engine. http:\/\/vivisimo.com\/."},{"key":"e_1_2_1_35_1","unstructured":"Vnnic. 2008. Vietnam Internet Center. http:\/\/www.thongkeinternet.vn.  Vnnic. 2008. Vietnam Internet Center. http:\/\/www.thongkeinternet.vn."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2007.86"},{"key":"e_1_2_1_37_1","unstructured":"Wikipedia. 2008. Latent semantic analysis. http:\/\/en.wikipedia.org\/wiki.  Wikipedia. 2008. Latent semantic analysis. http:\/\/en.wikipedia.org\/wiki."},{"key":"e_1_2_1_38_1","unstructured":"Xalo. 2008. Vietnamese search engine. http:\/\/xalo.vn.  Xalo. 2008. Vietnamese search engine. http:\/\/xalo.vn."},{"volume-title":"Proceedings of the National Conference on Artificial Intelligence (AAAI\u201907)","author":"Yih W.","key":"e_1_2_1_39_1","unstructured":"Yih , W. and Meek , C . 2007. Improving similarity measures for short segments of text . In Proceedings of the National Conference on Artificial Intelligence (AAAI\u201907) . Yih, W. and Meek, C. 2007. Improving similarity measures for short segments of text. In Proceedings of the National Conference on Artificial Intelligence (AAAI\u201907)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(99)00054-7"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1008992.1009030"},{"key":"e_1_2_1_42_1","unstructured":"Zing. 2008. Vietnamese Web site directory. http:\/\/directory.zing.vn.  Zing. 2008. Vietnamese Web site directory. http:\/\/directory.zing.vn."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1568292.1568295","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1568292.1568295","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:38:48Z","timestamp":1750253928000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1568292.1568295"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":42,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.1145\/1568292.1568295"],"URL":"https:\/\/doi.org\/10.1145\/1568292.1568295","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2009,8]]},"assertion":[{"value":"2008-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-08-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}