{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T21:35:44Z","timestamp":1773524144200,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2008,12,23]],"date-time":"2008-12-23T00:00:00Z","timestamp":1229990400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2008,12,23]],"date-time":"2008-12-23T00:00:00Z","timestamp":1229990400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Inf Retrieval"],"published-print":{"date-parts":[[2009,6]]},"DOI":"10.1007\/s10791-008-9083-7","type":"journal-article","created":{"date-parts":[[2008,12,22]],"date-time":"2008-12-22T20:21:36Z","timestamp":1229977296000},"page":"400-415","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Using the Web as corpus for self-training text categorization"],"prefix":"10.1007","volume":"12","author":[{"given":"Rafael","family":"Guzm\u00e1n-Cabrera","sequence":"first","affiliation":[]},{"given":"Manuel","family":"Montes-y-G\u00f3mez","sequence":"additional","affiliation":[]},{"given":"Paolo","family":"Rosso","sequence":"additional","affiliation":[]},{"given":"Luis","family":"Villase\u00f1or-Pineda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,12,23]]},"reference":[{"key":"9083_CR1","unstructured":"Aas, K., & Eikvil, L. (1999). Text categorization: A survey. Tech. Rep. 941. Norwegian Computing Center."},{"key":"9083_CR2","unstructured":"Argamon, S., & Levitan, S. (2005). Measuring the usefulness of function words for authorship attribution. In Proceedings of ACH\/ALLC Conference 2005."},{"key":"9083_CR3","unstructured":"Bekkerman, R., & Allan, J. (2004). Using bigrams in text categorization. Tech. Rep. IR-408. Center of Intelligent Information Retrieval, UMass Amherst."},{"issue":"1","key":"9083_CR4","first-page":"1","volume":"4","author":"C Chaski","year":"2005","unstructured":"Chaski, C. (2005). Who\u2019s at the keyboard: Authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 4(1), 1\u201313.","journal-title":"International Journal of Digital Evidence"},{"issue":"1","key":"9083_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1007730.1007733","volume":"6","author":"NV Chawla","year":"2004","unstructured":"Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explorations, 6(1), 1\u20136.","journal-title":"SIGKDD Explorations"},{"key":"9083_CR6","doi-asserted-by":"crossref","unstructured":"Coyotl-Morales, R. M., Villase\u00f1or-Pineda, L., Montes-Y-G\u00f3mez, M., & Rosso, P. (2006). Authorship attribution using word sequences. In J. F. Mart\u00ednez-Trinidad, J. A. Carrasco-Ochoa, & J. Kittler (Eds.), CIARP (Vol. 4225, pp. 844\u2013853). Springer, Lecture Notes in Computer Science.","DOI":"10.1007\/11892755_87"},{"issue":"1\/2","key":"9083_CR7","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1023\/A:1023824908771","volume":"19","author":"J Diederich","year":"2003","unstructured":"Diederich, J., Kindermann, J., Leopold, E., & Paass, G. (2003). Authorship attribution with support vector machines. Applied Intelligence, 19(1\/2), 109\u2013123.","journal-title":"Applied Intelligence"},{"issue":"2","key":"9083_CR8","doi-asserted-by":"publisher","first-page":"141","DOI":"10.2307\/1401602","volume":"36","author":"HO Hartley","year":"1968","unstructured":"Hartley, H. O., & Rao, J. N. K. (1968). Classification and estimation in analysis of variance problems. Review of the International Statistical Institute, 36(2), 141\u2013147.","journal-title":"Review of the International Statistical Institute"},{"key":"9083_CR9","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/BF01830689","volume":"28","author":"DI Holmes","year":"1994","unstructured":"Holmes, D. I. (1994). Authorship attribution. Computers and the Humanities, 28, 87\u2013106.","journal-title":"Computers and the Humanities"},{"key":"9083_CR10","unstructured":"Hoste, V. (2005). Optimization issues in machine learning of coreference resolution. Ph.D. thesis, Faculteit Letteren en Wijsbegeerte, Universiteit Antwerpen, Belgium."},{"key":"9083_CR11","unstructured":"Joachims, T. (1999). Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning (pp. 200\u2013209). San Francisco, CA: Morgan Kaufmann."},{"key":"9083_CR12","unstructured":"Kaster, A., Siersdorfer, S., & Weikum, G. (2005). Combining text and linguistic document representations for authorship attribution. In SIGIR Workshop: Stylistic Analysis of Text for Information Access (STYLE) (pp. 27\u201335)."},{"issue":"2","key":"9083_CR13","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1162\/089120103322711569","volume":"29","author":"A Kilgarriff","year":"2003","unstructured":"Kilgarriff, A., & Grefenstette, G. (2003). Introduction to the special issue of the Web as corpus. Computational Linguistics, 29(2), 333\u2013347.","journal-title":"Computational Linguistics"},{"key":"9083_CR14","doi-asserted-by":"crossref","unstructured":"Malyutov, M. B. (2006). Authorship attribution of texts: A review. In R. Ahlswede, L. B\u00e4umer, N. Cai, H. K. Aydinian, V. Blinovsky, C. Deppe, & H. Mashurian (Eds.), GTIT-C (Vol. 4123, pp. 362\u2013380). Springer, Lecture Notes in Computer Science.","DOI":"10.1007\/11889342_20"},{"key":"9083_CR15","doi-asserted-by":"crossref","unstructured":"Moschitti, A., & Basili, R. (2004). Complex linguistic features for text classification: A comprehensive study. In S. McDonald & J. Tait (Eds.), Proceedings of the 26th European Conference on Information Retrieval (ECIR 2004) (Vol. 2997, pp. 181\u2013196). Sunderland, UK: Springer, Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-540-24752-4_14"},{"issue":"2\/3","key":"9083_CR16","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1023\/A:1007692713085","volume":"39","author":"K Nigam","year":"2000","unstructured":"Nigam, K., Mccallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2\/3), 103\u2013134.","journal-title":"Machine Learning"},{"issue":"3\u20134","key":"9083_CR17","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1023\/B:INRT.0000011209.19643.e2","volume":"7","author":"F Peng","year":"2004","unstructured":"Peng, F., Schuurmans, D., Wang, S. (2004). Augmenting naive Bayes classifiers with statistical language models. Information Retrieval, 7(3\u20134), 317\u2013345.","journal-title":"Information Retrieval"},{"issue":"1","key":"9083_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/505282.505283","volume":"34","author":"F Sebastiani","year":"2002","unstructured":"Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1\u201347.","journal-title":"ACM Computing Surveys"},{"key":"9083_CR19","unstructured":"Seeger, M. (2000). Learning with labeled and unlabeled data. Tech. Rep. Edinburgh, UK: University of Edinburgh."},{"key":"9083_CR20","doi-asserted-by":"crossref","unstructured":"Smucker, M., Allan, J., & Carterette, B. (2007). A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the ACM Sixteenth Conference on Information and Knowledge Management (pp. 623\u2013632).","DOI":"10.1145\/1321440.1321528"},{"key":"9083_CR21","unstructured":"Solorio, T. (2002). Using unlabeled data to improve classifier accuracy. Master\u2019s thesis, Computer Science Department, INAOE, Mexico."},{"key":"9083_CR22","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1023\/A:1002681919510","volume":"35","author":"E Stamatatos","year":"2001","unstructured":"Stamatatos, E., Fakotakis, N., & Kokkinakis, G. (2001). Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35, 193\u2013214.","journal-title":"Computers and the Humanities"},{"key":"9083_CR23","unstructured":"Witten, I. H., & Frank, E. (1999). Data mining: Practical machine learning tools and techniques with Java implementations. Morgan Kaufmann."},{"key":"9083_CR24","unstructured":"Yu, B. (2006). An evaluation of text classification methods for literary study. Ph.D. thesis, Champaign, IL, USA."},{"key":"9083_CR25","doi-asserted-by":"crossref","unstructured":"Zelikovitz, S., & Hirsh, H. (2002). Integrating background knowledge into nearest-neighbor text classification. In S. Craw  & A. D. Preece (Eds.), ECCBR (Vol. 2416, pp. 1\u20135). Springer, Lecture Notes in Computer Science.","DOI":"10.1007\/3-540-46119-1_1"},{"key":"9083_CR26","unstructured":"Zelikovitz, S., & Kogan, M. (2006). Using web searches on important words to create background sets for LSI classification. In G. Sutcliffe & R. Goebel (Eds.), FLAIRS Conference (pp. 598\u2013603). AAAI Press."},{"key":"9083_CR27","doi-asserted-by":"crossref","unstructured":"Zhao, Y., & Zobel, J. (2005). Effective and scalable authorship attribution using function words. In G. G. Lee, A. Yamada, H. Meng, & S. H. Myaeng (Eds.), AIRS (Vol. 3689, pp. 174\u2013189). Springer, Lecture Notes in Computer Science.","DOI":"10.1007\/11562382_14"},{"key":"9083_CR28","unstructured":"Zhu, X. (2005). Semi-supervised learning literature survey. Tech. Rep. Computer Sciences, University of Wisconsin-Madison."}],"container-title":["Information Retrieval"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10791-008-9083-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10791-008-9083-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10791-008-9083-7","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10791-008-9083-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T14:41:24Z","timestamp":1704206484000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10791-008-9083-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12,23]]},"references-count":28,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,6]]}},"alternative-id":["9083"],"URL":"https:\/\/doi.org\/10.1007\/s10791-008-9083-7","relation":{},"ISSN":["1386-4564","1573-7659"],"issn-type":[{"value":"1386-4564","type":"print"},{"value":"1573-7659","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,12,23]]},"assertion":[{"value":"11 May 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 November 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 December 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}