{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:32:26Z","timestamp":1750307546848,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2009,10,1]],"date-time":"2009-10-01T00:00:00Z","timestamp":1254355200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Speech Lang. Process."],"published-print":{"date-parts":[[2009,10]]},"abstract":"<jats:p>We propose a text-categorization bootstrapping algorithm in which categories are described by relevant seed words. Our method introduces two unsupervised techniques to improve the initial categorization step of the bootstrapping scheme: (i) using latent semantic spaces to estimate the similarity among documents and words, and (ii) the Gaussian mixture algorithm, which differentiates relevant and nonrelevant category information using statistics from unlabeled examples. In particular, this second step maps the similarity scores to class posterior probabilities, and therefore reduces sensitivity to keyword-dependent variations in scores. The algorithm was evaluated on two text categorization tasks, and obtained good performance using only the category names as initial seeds. In particular, the performance of the proposed method proved to be equivalent to a pure supervised approach trained on 70--160 labeled documents per category.<\/jats:p>","DOI":"10.1145\/1596515.1596516","type":"journal-article","created":{"date-parts":[[2009,10,9]],"date-time":"2009-10-09T19:06:16Z","timestamp":1255115176000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Improving text categorization bootstrapping via unsupervised learning"],"prefix":"10.1145","volume":"6","author":[{"given":"Alfio","family":"Gliozzo","sequence":"first","affiliation":[{"name":"STLab-ISTC-CNR, Rome"}]},{"given":"Carlo","family":"Strapparava","sequence":"additional","affiliation":[{"name":"FBK-IRST, Povo"}]},{"given":"Ido","family":"Dagan","sequence":"additional","affiliation":[{"name":"Bar Ilan University"}]}],"member":"320","published-online":{"date-parts":[[2009,10,14]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073143"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1162\/0891201041850876"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956920"},{"volume-title":"The Lambda Calculus: Its Syntax and Semantics. North Holland","author":"Barendregt H.","key":"e_1_2_1_4_1","unstructured":"Barendregt , H. 1984. The Lambda Calculus: Its Syntax and Semantics. North Holland , Amsterdam . Barendregt, H. 1984. The Lambda Calculus: Its Syntax and Semantics. North Holland, Amsterdam."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1177\/109434209200600103"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279962"},{"volume-title":"Proceedings of the EMNLP'99 Conference.","author":"Collins M.","key":"e_1_2_1_8_1","unstructured":"Collins , M. and Singer , Y . 1999. Unsupervised models for named entity classification . In Proceedings of the EMNLP'99 Conference. Collins, M. and Singer, Y. 1999. Unsupervised models for named entity classification. In Proceedings of the EMNLP'99 Conference."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"volume-title":"An Electronic Lexical Database","author":"Fellbaum C.","key":"e_1_2_1_10_1","unstructured":"Fellbaum , C. 1998. WordNet . An Electronic Lexical Database . MIT Press , Cambridge, MA . Fellbaum, C. 1998. WordNet. An Electronic Lexical Database. MIT Press, Cambridge, MA."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1314498.1314573"},{"volume-title":"Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL'05)","author":"Gliozzo A.","key":"e_1_2_1_12_1","unstructured":"Gliozzo , A. and Strapparava , C . 2005. Domains kernels for text categorization . In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL'05) . Gliozzo, A. and Strapparava, C. 2005. Domains kernels for text categorization. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL'05)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2004.05.006"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220592"},{"volume-title":"Proceedings of the 15th European Conference on Machine Learning (ECML) and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD).","author":"Godbole S.","key":"e_1_2_1_15_1","unstructured":"Godbole , S. , Harpale , A. , Sarawagi , S. , and Chakrabarti , S . 2004. Document classication through interactive supervision of document and term labels . In Proceedings of the 15th European Conference on Machine Learning (ECML) and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). Godbole, S., Harpale, A., Sarawagi, S., and Chakrabarti, S. 2004. Document classication through interactive supervision of document and term labels. In Proceedings of the 15th European Conference on Machine Learning (ECML) and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD)."},{"volume-title":"Advances in Kernel Methods: Support Vector Learning, B. Scholkopf et al., Eds","author":"Joachims T.","key":"e_1_2_1_16_1","unstructured":"Joachims , T. 1999. Making large-scale SVM learning practical . In Advances in Kernel Methods: Support Vector Learning, B. Scholkopf et al., Eds . MIT Press , Cambridge, MA , 169--184. Joachims, T. 1999. Making large-scale SVM learning practical. In Advances in Kernel Methods: Support Vector Learning, B. Scholkopf et al., Eds. MIT Press, Cambridge, MA, 169--184."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.3115\/990820.990886"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.3115\/1072228.1072302"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218988"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50048-7"},{"volume-title":"Proceedings of the Conference on Natural Language Processing and Information Extraction.","author":"Liu B.","key":"e_1_2_1_21_1","unstructured":"Liu , B. , Li , X. , Lee , W. S. , and Yu , P. S . 2004. Text classification by labeling words . In Proceedings of the Conference on Natural Language Processing and Information Extraction. Liu, B., Li, X., Lee, W. S., and Yu, P. S. 2004. Text classification by labeling words. In Proceedings of the Conference on Natural Language Processing and Information Extraction."},{"volume-title":"Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC'00)","author":"Magnini B.","key":"e_1_2_1_22_1","unstructured":"Magnini , B. and Cavaglia , G . 2000. Integrating subject field codes into WordNet . In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC'00) . 1413--1418. Magnini, B. and Cavaglia, G. 2000. Integrating subject field codes into WordNet. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC'00). 1413--1418."},{"volume-title":"Proceedings of the 2nd International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL2 ). 111--114","author":"Magnini B.","key":"e_1_2_1_23_1","unstructured":"Magnini , B. , Strapparava , C. , Pezzulo , G. , and Gliozzo , A . 2001. Using domain information for word sense disambiguation . In Proceedings of the 2nd International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL2 ). 111--114 . Magnini, B., Strapparava, C., Pezzulo, G., and Gliozzo, A. 2001. Using domain information for word sense disambiguation. In Proceedings of the 2nd International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL2 ). 111--114."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324902003029"},{"volume-title":"Proceedings of the Workshop for Unsupervised Learning in Natural Language Processing (ACL'99)","author":"McCallum A.","key":"e_1_2_1_25_1","unstructured":"McCallum , A. and Nigam , K . 1999. Text classification by bootstrapping with keywords, EM and shrinkage . In Proceedings of the Workshop for Unsupervised Learning in Natural Language Processing (ACL'99) . McCallum, A. and Nigam, K. 1999. Text classification by bootstrapping with keywords, EM and shrinkage. In Proceedings of the Workshop for Unsupervised Learning in Natural Language Processing (ACL'99)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1137\/1026034"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1005336"},{"key":"e_1_2_1_28_1","unstructured":"Salton G. and McGill M. 1983. In Introduction to Modern Information Retrieval. McGraw-Hill New York.   Salton G. and McGill M. 1983. In Introduction to Modern Information Retrieval. McGraw-Hill New York."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/505282.505283"},{"key":"e_1_2_1_30_1","unstructured":"Silverman B. W. 1986. In Density Estimation for Statistics and Data Analysis. Chapman and Hall.  Silverman B. W. 1986. In Density Estimation for Statistics and Data Analysis. Chapman and Hall."},{"volume-title":"The Nature of Statistical Learning Theory","author":"Vapnik V.","key":"e_1_2_1_31_1","unstructured":"Vapnik , V. 1995. The Nature of Statistical Learning Theory . Springer , Berlin . Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer, Berlin."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.3115\/981658.981684"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/383952.384012"}],"container-title":["ACM Transactions on Speech and Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1596515.1596516","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1596515.1596516","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:23:32Z","timestamp":1750249412000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1596515.1596516"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,10]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,10]]}},"alternative-id":["10.1145\/1596515.1596516"],"URL":"https:\/\/doi.org\/10.1145\/1596515.1596516","relation":{},"ISSN":["1550-4875","1550-4883"],"issn-type":[{"type":"print","value":"1550-4875"},{"type":"electronic","value":"1550-4883"}],"subject":[],"published":{"date-parts":[[2009,10]]},"assertion":[{"value":"2008-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-10-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}