{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T13:10:06Z","timestamp":1723209006020},"reference-count":38,"publisher":"Wiley","issue":"4","license":[{"start":{"date-parts":[[2013,2,20]],"date-time":"2013-02-20T00:00:00Z","timestamp":1361318400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Am Soc Inf Sci Tec"],"published-print":{"date-parts":[[2013,4]]},"abstract":"<jats:p>We describe the latent semantic indexing subspace signature model (<jats:styled-content style=\"fixed-case\">LSISSM<\/jats:styled-content>) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top\u2010ranking latent concept dimensions. <jats:styled-content style=\"fixed-case\">LSISSM<\/jats:styled-content> matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (<jats:styled-content style=\"fixed-case\">LSI<\/jats:styled-content>) term subspace and <jats:styled-content style=\"fixed-case\">LSI<\/jats:styled-content> document subspace. <jats:styled-content style=\"fixed-case\">LSISSM<\/jats:styled-content> does feature reduction and finds a low\u2010rank approximation of scalable and sparse term\u2010document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard <jats:styled-content style=\"fixed-case\">K<\/jats:styled-content>\u2010means and self\u2010organizing maps compared with the vector space model and the traditional <jats:styled-content style=\"fixed-case\">LSI<\/jats:styled-content> model. The unique contribution ranking mechanism in <jats:styled-content style=\"fixed-case\">LSISSM<\/jats:styled-content> also improves the initialization of standard <jats:styled-content style=\"fixed-case\">K<\/jats:styled-content>\u2010means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two\u2010stage initialization strategy based on <jats:styled-content style=\"fixed-case\">LSISSM<\/jats:styled-content> significantly reduces the running time of standard <jats:styled-content style=\"fixed-case\">K<\/jats:styled-content>\u2010means procedures.<\/jats:p>","DOI":"10.1002\/asi.22623","type":"journal-article","created":{"date-parts":[[2013,2,20]],"date-time":"2013-02-20T20:50:41Z","timestamp":1361393441000},"page":"844-860","source":"Crossref","is-referenced-by-count":6,"title":["Document clustering using the <scp>LSI<\/scp> subspace signature model"],"prefix":"10.1002","volume":"64","author":[{"given":"W.Z.","family":"Zhu","sequence":"first","affiliation":[{"name":"College of Information Science and Technology Drexel University  3141 Chestnut Street Philadelphia PA 19104\u20102875"}]},{"given":"R.B.","family":"Allen","sequence":"additional","affiliation":[{"name":"College of Information Science and Technology Drexel University  3141 Chestnut Street Philadelphia PA 19104\u20102875"}]}],"member":"311","published-online":{"date-parts":[[2013,2,20]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:NEPL.0000023449.95030.8f"},{"key":"e_1_2_10_3_1","unstructured":"Arthur D. &Vassilvitskii S.(2007).K\u2010means\u2009++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM SIAM Symposium on Discrete Algorithms (ACM SIAM \\x9107) (pp.1027\u20131035).New York:ACM Press."},{"key":"e_1_2_10_4_1","doi-asserted-by":"crossref","unstructured":"Beil F. Ester M. &Xu X.(2002).Frequent term\u2010based text clustering. In Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining (KDD \\x9102) (pp.436\u2013442).New York:ACM Press.","DOI":"10.1145\/775047.775110"},{"key":"e_1_2_10_5_1","doi-asserted-by":"publisher","DOI":"10.1162\/jmlr.2003.3.4-5.993"},{"key":"e_1_2_10_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.198"},{"key":"e_1_2_10_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_2_10_8_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20148"},{"key":"e_1_2_10_9_1","volume-title":"Algorithms for clustering data","author":"Dubes R.C.","year":"1988"},{"key":"e_1_2_10_10_1","doi-asserted-by":"crossref","unstructured":"Fung B.C.M. Wang K. &Ester M.(2003).Hierarchical document clustering using frequent itemsets. In Proceedings of the SIAM International Conference on Data Mining (SDM \\x9103) (pp.59\u201370).San Francisco CA:SIAM. Available at:http:\/\/users.encs.concordia.ca\/~fung\/pub\/FWE03sdm.pdf","DOI":"10.1137\/1.9781611972733.6"},{"key":"e_1_2_10_11_1","doi-asserted-by":"crossref","unstructured":"Gelbukh A. &Sidorov G.(2001).Zipf and Heaps laws\\x92 coefficients depend on language. In Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing (CICLing '01) (pp.332\u2013335).London:Springer\u2010Verlag.","DOI":"10.1007\/3-540-44686-9_33"},{"key":"e_1_2_10_12_1","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.114.2.211"},{"key":"e_1_2_10_13_1","doi-asserted-by":"crossref","unstructured":"Hofmann T.(1999).Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR \\x9199) (pp. 50\\x9657).New York:ACM Press.","DOI":"10.1145\/312624.312649"},{"key":"e_1_2_10_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.58325"},{"key":"e_1_2_10_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095366.1095368"},{"key":"e_1_2_10_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2004.11.007"},{"key":"e_1_2_10_17_1","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1007\/11498186_19","volume-title":"Foundations of data mining and knowledge discovery","author":"Kontostathis A.","year":"2005"},{"key":"e_1_2_10_18_1","doi-asserted-by":"crossref","unstructured":"Lin X. Soergel D. &Marchionini G.(1991).A self\u2010organizing semantic map for information retrieval. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR \\x9191) (pp.262\u2013269).New York:ACM Press.","DOI":"10.1145\/122860.122887"},{"key":"e_1_2_10_19_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"e_1_2_10_20_1","unstructured":"Mason J.E. &Spiteri R.J.(2008 April).A new adaptive folding\u2010up algorithm for information retrieval. Paper presented at the Text Mining Workshop 2008 held in conjunction with the Eighth SIAM International Conference on Data Mining (SDM 2008) Atlanta GA."},{"issue":"6","key":"e_1_2_10_21_1","first-page":"386","article-title":"The perceptron: A probabilistic model for information storage and organization in the brain","volume":"65","author":"Rosenblatt F.","year":"1958","journal-title":"Cornell Aeronautical Laboratory, Psychological Review"},{"key":"e_1_2_10_22_1","first-page":"405","volume-title":"An introduction to neural and electronic networks","author":"Rumelhart D.E.","year":"1990"},{"key":"e_1_2_10_23_1","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/5236.001.0001"},{"key":"e_1_2_10_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/361219.361220"},{"key":"e_1_2_10_25_1","doi-asserted-by":"crossref","unstructured":"Sch\u00fctze H. &Silverstein C.(1997).Projections for efficient document clustering. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR \\x9197) (pp.74\u201381).New York:ACM Press.","DOI":"10.1145\/278459.258539"},{"key":"e_1_2_10_26_1","doi-asserted-by":"crossref","unstructured":"Silic A. Moens M.F. \u017dmak L. &Basic B.D.(2008).Comparing document classification schemes using K\u2010means clustering. In Proceedings of the 12th International Conference on Knowledge\u2010Based Intelligent Information and Engineering Systems. (pp.615\u2013624).Berlin Germany:Springer.","DOI":"10.1007\/978-3-540-85563-7_78"},{"key":"e_1_2_10_27_1","unstructured":"Steinbach M. Karypis G. &Kumar V.(2000 August).A comparison of document clustering techniques. Paper presented at the KDD Workshop of TextMining 2000 Boston MA."},{"key":"e_1_2_10_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2006.12.018"},{"key":"e_1_2_10_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apnum.2007.01.016"},{"key":"e_1_2_10_30_1","doi-asserted-by":"crossref","unstructured":"Toutanova K. Klein D. Manning D.C. &Singer Y.(2003).Feature\u2010rich part\u2010of\u2010speech tagging with a cyclic dependency network. In Proceedings of HLT and the Third Conference of the North American Chapter of the ACL (pp.252\u2013259).","DOI":"10.3115\/1073445.1073478"},{"key":"e_1_2_10_31_1","doi-asserted-by":"crossref","unstructured":"Toutanova K. &Manning D.C.(2000).Enriching the knowledge sources used in a maximum entropy part\u2010of\u2010speech tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP\/VLC \\x9100) (pp.63\u201370).Stroudsburg PA:Association for Computational Linguistics.","DOI":"10.3115\/1117794.1117802"},{"key":"e_1_2_10_32_1","doi-asserted-by":"crossref","unstructured":"Wang K. Xu C. &Liu B.(1999).Clustering transactions using large items. In Proceedings of the Eighth International Conference on Information and Knowledge Management (ACM CIKM \\x9199) (pp. 483\\x96490).New York:ACM Press.","DOI":"10.1145\/319950.320054"},{"key":"e_1_2_10_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90027-1"},{"key":"e_1_2_10_34_1","volume-title":"The roots of backpropagation: from ordered derivatives to neural networks and political forecasting","author":"Werbos P.J.","year":"1994"},{"key":"e_1_2_10_35_1","doi-asserted-by":"crossref","unstructured":"Xu W. &Gong Y.(2004).Document clustering by concept factorization. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR \\x9104) (pp. 202\\x96209).New York NY:ACM Press.","DOI":"10.1145\/1008992.1009029"},{"key":"e_1_2_10_36_1","doi-asserted-by":"crossref","unstructured":"Xu W. Liu X. &Gong Y.(2003).Document clustering based on non\u2010negative matrix factorization. Document clustering by concept factorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR \\x9104) (pp.267\u2013273).New York NY:ACM Press.","DOI":"10.1145\/860435.860485"},{"key":"e_1_2_10_37_1","unstructured":"Zhu W.(2009).Text clustering and active learning using a latent semantic indexing (lsi) subspace signature model and query expansion (Unpublished doctoral dissertation). Drexel University. Available at:http:\/\/idea.library.drexel.edu\/bitstream\/1860\/3077\/1\/Zhu_Weizhong.pdf"},{"key":"e_1_2_10_38_1","unstructured":"Zhu W. &Allen R. B.(Manuscript submitted for publication).Active learning for text classification using the LSI subspace signature model."},{"key":"e_1_2_10_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cag.2007.01.025"}],"container-title":["Journal of the American Society for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fasi.22623","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/asi.22623","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,6]],"date-time":"2023-10-06T03:31:41Z","timestamp":1696563101000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/asi.22623"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,2,20]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,4]]}},"alternative-id":["10.1002\/asi.22623"],"URL":"https:\/\/doi.org\/10.1002\/asi.22623","archive":["Portico"],"relation":{},"ISSN":["1532-2882","1532-2890"],"issn-type":[{"value":"1532-2882","type":"print"},{"value":"1532-2890","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,2,20]]}}}