{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T09:50:49Z","timestamp":1763459449964,"version":"3.45.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2012,9,1]],"date-time":"2012-09-01T00:00:00Z","timestamp":1346457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Consortium for Healthcare Informatics Research","award":["VA HSR HIR 08-374"],"award-info":[{"award-number":["VA HSR HIR 08-374"]}]},{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["IIS-0712764"],"award-info":[{"award-number":["IIS-0712764"]}],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000738","name":"U.S. Department of Veterans Affairs","doi-asserted-by":"publisher","award":["VA HSR HIR 08-204"],"award-info":[{"award-number":["VA HSR HIR 08-204"]}],"id":[{"id":"10.13039\/100000738","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2012,9]]},"abstract":"<jats:p>Automatic clustering of Web pages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, Web page clustering algorithms use only features extracted from the page-text. However, the advent of social-bookmarking Web sites, such as StumbleUpon.com and Delicious.com, has led to a huge amount of user-generated content such as the social tag information that is associated with the Web pages. In this article, we present a subspace based feature extraction approach that leverages the social tag information to complement the page-contents of a Web page for extracting beter features, with the goal of improved clustering performance. In our approach, we consider page-text and tags as two separate views of the data, and learn a shared subspace that maximizes the correlation between the two views. Any clustering algorithm can then be applied in this subspace. We then present an extension that allows our approach to be applicable even if the Web page corpus is only partially tagged, that is, when the social tags are present for not all, but only for a small number of Web pages. We compare our subspace based approach with a number of baselines that use tag information in various other ways, and show that the subspace based approach leads to improved performance on the Web page clustering task. We also discuss some possible future work including an active learning extension that can help in choosing which Web pages to get tags for, if we only can get the social tags for only a small number of Web pages.<\/jats:p>","DOI":"10.1145\/2337542.2337552","type":"journal-article","created":{"date-parts":[[2012,10,12]],"date-time":"2012-10-12T16:56:02Z","timestamp":1350060962000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Web Page Clustering"],"prefix":"10.1145","volume":"3","author":[{"given":"Anusua","family":"Trivedi","sequence":"first","affiliation":[{"name":"University of Utah, Salt Lake City"}]},{"given":"Piyush","family":"Rai","sequence":"additional","affiliation":[{"name":"University of Utah, Salt Lake City"}]},{"suffix":"III","given":"Hal","family":"Daum\u00e9","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park"}]},{"given":"Scott L.","family":"Duvall","sequence":"additional","affiliation":[{"name":"VA SLC Health Care System and University of Utah, Salt Lake City"}]}],"member":"320","published-online":{"date-parts":[[2012,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273500"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244303768966085"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242640"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1032649.1033432"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Blaschko M. B.","key":"e_1_2_1_5_1","unstructured":"Blaschko, M. B. and Lampert, C. H. 2008. Correlational spectral clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/3120828.3120859"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860460"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279962"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","unstructured":"Boyd S. and Vandenberghe L. 2004. Convex Optimization. Cambridge University Press Cambridge UK.","DOI":"10.5555\/993483"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015350"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135869"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009715923555"},{"volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistic.","author":"Carreira-Perpinan M. A.","key":"e_1_2_1_14_1","unstructured":"Carreira-Perpinan, M. A. and Lu, Z. 2007. The Laplacian eigenmaps latent variable model. In Proceedings of the International Conference on Artificial Intelligence and Statistic."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553391"},{"volume-title":"Proceedings of the Conference on Advances in Neural Information Processings Systems.","author":"Cohn D.","key":"e_1_2_1_16_1","unstructured":"Cohn, D. and Hofmann, T. 2001. The missing link - a probabilistic model of document content and hypertext connectivity. In Proceedings of the Conference on Advances in Neural Information Processings Systems."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/795665.796496"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the Workshop on Learning with Multiple Views, International Conference on Machine Learning.","author":"de Sa V. R.","year":"2005","unstructured":"de Sa, V. R. 2005. Spectral Clustering with two views. In Proceedings of the Workshop on Learning with Multiple Views, International Conference on Machine Learning."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860550"},{"key":"e_1_2_1_20_1","unstructured":"Foster D. P. Kakade S. M. and Zhang T. 2008. Multi-view dimensionality reduction via canonical correlation analysis. Tech. rep. TTI-TR-2008-4 University of Pennsylvania."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/646258.686041"},{"volume-title":"Proceedings of the 3rd International Workshop on Content-Based Multimedia Indexing.","author":"Hardoon D. R.","key":"e_1_2_1_22_1","unstructured":"Hardoon, D. R. and Shawe-Taylor, J. 2003. Kcca for different level precision in content-based image retrieval. In Proceedings of the 3rd International Workshop on Content-Based Multimedia Indexing."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766042321814"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/11811305_75"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Hotelling H. 1936. Relations between two sets of variables. Biometrika. 321--377.","DOI":"10.1093\/biomet\/28.3-4.321"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401939"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/1768841.1768852"},{"volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics.","author":"Kim M.","key":"e_1_2_1_28_1","unstructured":"Kim, M. and Pavlovic, V. 2009. Covariance operator based dimensionality reduction with extension to semi-supervised settings. In Proceedings of the International Conference on Artificial Intelligence and Statistics."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1497577.1497578"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1888305.1888320"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646167"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.","author":"McQueen J.","year":"1967","unstructured":"McQueen, J. 1967. Some methods of classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/645531.655845"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150487"},{"volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems","author":"Rai P.","key":"e_1_2_1_35_1","unstructured":"Rai, P. and Daum\u00e9 III, H. 2009. Multi-label prediction via sparse infinite CCA. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Vancouver, Canada."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498759.1498809"},{"volume-title":"Proceedings of the International Conference on Medical Image Computing and Cmputer Assisted Intervention fMRI Data Analysis Workshop.","author":"Rustandi I.","key":"e_1_2_1_37_1","unstructured":"Rustandi, I., Just, M. A., and Mitchell, T. M. 2009. Integrating multiple-study multiple-subject fmri datasets using canonical correlation analysis. In Proceedings of the International Conference on Medical Image Computing and Cmputer Assisted Intervention fMRI Data Analysis Workshop."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976698300017467"},{"volume-title":"Computer Sciences","author":"Settles B.","key":"e_1_2_1_39_1","unstructured":"Settles, B. 2009. Active learning literature survey. Tech. rep. 1648, Computer Sciences, University of Wisconsin--Madison."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","unstructured":"Shawe-Taylor J. and Cristianini N. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press Cambridge UK.","DOI":"10.5555\/975545"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Socher R.","key":"e_1_2_1_41_1","unstructured":"Socher, R. and Fei-Fei, L. 2010. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458114"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/645413.652137"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11222-007-9033-z"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015345"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1137\/080718206"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367594"},{"volume-title":"Proceedings of the International Conference on Machine Learning Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining (ICML\u201903)","author":"Zhu X.","key":"e_1_2_1_48_1","unstructured":"Zhu, X., Lafferty, J., and Ghahramani, Z. 2003. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining (ICML\u201903). 58--65."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1600193.1600211"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2337542.2337552","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2337542.2337552","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2337542.2337552","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T09:46:15Z","timestamp":1763459175000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2337542.2337552"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,9]]}},"alternative-id":["10.1145\/2337542.2337552"],"URL":"https:\/\/doi.org\/10.1145\/2337542.2337552","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2012,9]]},"assertion":[{"value":"2010-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-03-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-09-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}