{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T06:11:14Z","timestamp":1775283074646,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2008,2,1]],"date-time":"2008-02-01T00:00:00Z","timestamp":1201824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2008,2]]},"abstract":"<jats:p>This article addresses the problem of spam blog (splog) detection using temporal and structural regularity of content, post time and links. Splogs are undesirable blogs meant to attract search engine traffic, used solely for promoting affiliate sites. Blogs represent popular online media, and splogs not only degrade the quality of search engine results, but also waste network resources. The splog detection problem is made difficult due to the lack of stable content descriptors.<\/jats:p>\n          <jats:p>We have developed a new technique for detecting splogs, based on the observation that a blog is a dynamic, growing sequence of entries (or posts) rather than a collection of individual pages. In our approach, splogs are recognized by their temporal characteristics and content. There are three key ideas in our splog detection framework. (a) We represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts, to investigate the temporal changes of the post sequence. (b) We study the blog temporal characteristics using a visual representation derived from the self-similarity measures. The visual signature reveals correlation between attributes and posts, depending on the type of blogs (normal blogs and splogs). (c) We propose two types of novel temporal features to capture the splog temporal characteristics. In our splog detector, these novel features are combined with content based features. We extract a content based feature vector from blog home pages as well as from different parts of the blog. The dimensionality of the feature vector is reduced by Fisher linear discriminant analysis. We have tested an SVM-based splog detector using proposed features on real world datasets, with appreciable results (90% accuracy).<\/jats:p>","DOI":"10.1145\/1326561.1326565","type":"journal-article","created":{"date-parts":[[2008,3,12]],"date-time":"2008-03-12T22:35:44Z","timestamp":1205361344000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Detecting splogs via temporal dynamics using self-similarity analysis"],"prefix":"10.1145","volume":"2","author":[{"given":"Yu-Ru","family":"Lin","sequence":"first","affiliation":[{"name":"Arizona State University, AZ"}]},{"given":"Hari","family":"Sundaram","sequence":"additional","affiliation":[{"name":"Arizona State University, AZ"}]},{"given":"Yun","family":"Chi","sequence":"additional","affiliation":[{"name":"NEC Laboratories America, Cupertino, CA"}]},{"given":"Junichi","family":"Tatemura","sequence":"additional","affiliation":[{"name":"NEC Laboratories America, Cupertino, CA"}]},{"given":"Belle L.","family":"Tseng","sequence":"additional","affiliation":[{"name":"NEC Laboratories America, Cupertino, CA"}]}],"member":"320","published-online":{"date-parts":[[2008,3,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb).","author":"Benczur A.","unstructured":"Benczur , A. , Csalogany , K. , Sarlos , T. , and Uher , M . 2005. Spamrank-fully automatic link spam detection . In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb). Benczur, A., Csalogany, K., Sarlos, T., and Uher, M. 2005. Spamrank-fully automatic link spam detection. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1189702.1189703"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval.","author":"Cavnar W. B.","unstructured":"Cavnar , W. B. and Trenkle , J. M . 1994. N-gram-based text categorization . In Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval. Cavnar, W. B. and Trenkle, J. M. 1994. N-gram-based text categorization. In Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval."},{"key":"e_1_2_1_4_1","volume-title":"-J","author":"Chang C.-C.","year":"2001","unstructured":"Chang , C.-C. and Lin , C . -J . 2001 . Libsvm : A library for support vector machines. ntv.edu.two- cjlin\/papers (libsvm, ps.gz). Chang, C.-C. and Lin, C.-J. 2001. Libsvm: A library for support vector machines. ntv.edu.two- cjlin\/papers (libsvm, ps.gz)."},{"key":"e_1_2_1_5_1","unstructured":"Duda R. O. Hart P. E. and Stork D. G. 2001. Pattern Classification. John Wiley &amp; Sons Inc. New York.   Duda R. O. Hart P. E. and Stork D. G. 2001. Pattern Classification. John Wiley &amp; Sons Inc. New York."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1209\/0295-5075\/4\/9\/004"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1017074.1017077"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1076034.1076066"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060839"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the International Conference on Music Information Retrieval. 265--266","author":"Foote J.","unstructured":"Foote , J. , Cooper , M. , and Nam , U . 2002. Audio retrieval by rhythmic similarity . In Proceedings of the International Conference on Music Information Retrieval. 265--266 . Foote, J., Cooper, M., and Nam, U. 2002. Audio retrieval by rhythmic similarity. In Proceedings of the International Conference on Music Information Retrieval. 265--266."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04)","author":"Gy\u00f6ngyi Z.","unstructured":"Gy\u00f6ngyi , Z. , Garcia-Molina , H. , and Pedersen , J . 2004. Combating Web spam with trustrank . In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04) . Toronto, Canada. Morgan Kaufmann. 576--587. Gy\u00f6ngyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating Web spam with trustrank. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04). Toronto, Canada. Morgan Kaufmann. 576--587."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb).","author":"Gy\u00f6ngyi Z.","unstructured":"Gy\u00f6ngyi , Z. and Garcia-Molina , H . 2005. Web spam taxonomy . In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb). Gy\u00f6ngyi, Z. and Garcia-Molina, H. 2005. Web spam taxonomy. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb)."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB)","author":"Gy\u00f6ngyi Z.","unstructured":"Gy\u00f6ngyi , Z. , Berkhin , P. , Garcia-Molina , H. , and Pedersen , J . 2006. Link spam detection based on mass estimation . In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB) . Seoul, Korea. 439--450. Gy\u00f6ngyi, Z., Berkhin, P., Garcia-Molina, H., and Pedersen, J. 2006. Link spam detection based on mass estimation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB). Seoul, Korea. 439--450."},{"key":"e_1_2_1_14_1","volume-title":"WWW2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics","author":"Han S.","unstructured":"Han , S. , Ahn , Y. , Moon , S. , and Jeong , H . 2006. Collaborative blog spam filtering using adaptive percolation search . WWW2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics . Edinburgh. Han, S., Ahn, Y., Moon, S., and Jeong, H. 2006. Collaborative blog spam filtering using adaptive percolation search. WWW2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics. Edinburgh."},{"key":"e_1_2_1_15_1","volume-title":"Welcome to the splogosphere: 75&percnt","author":"Kolari P.","year":"2005","unstructured":"Kolari , P. 2005. Welcome to the splogosphere: 75&percnt ; of new pings are spings (splogs). http:\/\/ebiquity.umbc.edu\/blogger\/ 2005 \/12\/15\/welcome-to-the-splogosphere-75-of-new-blog-posts- are-spam\/. Kolari, P. 2005. Welcome to the splogosphere: 75&percnt; of new pings are spings (splogs). http:\/\/ebiquity.umbc.edu\/blogger\/2005\/12\/15\/welcome-to-the-splogosphere-75-of-new-blog-posts- are-spam\/."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs.","author":"Kolari P.","unstructured":"Kolari , P. , Finin , T. , and Joshi , A . 2006a. Svms for the blogosphere: Blog identification and splog detection . In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs. Kolari, P., Finin, T., and Joshi, A. 2006a. Svms for the blogosphere: Blog identification and splog detection. In Proceedings of the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wide Web Conference.","author":"Kolari P.","unstructured":"Kolari , P. , Java , A. , and Finin , T . 2006b. Characterizing the splogosphere . In Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wide Web Conference. Kolari, P., Java, A., and Finin, T. 2006b. Characterizing the splogosphere. In Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wide Web Conference."},{"key":"e_1_2_1_18_1","unstructured":"Kolari P. Java A. Finin T. Mayfield J. Joshi A. and Martineau J. 2006c. Blog track open task: Spam blog classification. TREC Blog Track Notebook.  Kolari P. Java A. Finin T. Mayfield J. Joshi A. and Martineau J. 2006c. Blog track open task: Spam blog classification. TREC Blog Track Notebook."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06)","author":"Kolari P.","unstructured":"Kolari , P. , Java , A. , Finin , T. , Oates , T. , and Joshi , A . 2006d. Detecting spam blogs: A machine learning approach . In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06) . Boston, MA. Kolari, P., Java, A., Finin, T., Oates, T., and Joshi, A. 2006d. Detecting spam blogs: A machine learning approach. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06). Boston, MA."},{"key":"e_1_2_1_20_1","volume-title":"IEEE International Conference on Multimedia and Expo 2007: 2030--2033","author":"Lin Y.","unstructured":"Lin , Y. , Sundaram , H. , Chi , Y. , Tatemura , J. , and Tseng , B . 2007. Splog detection using content, time and link structures . IEEE International Conference on Multimedia and Expo 2007: 2030--2033 . Lin, Y., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. 2007. Splog detection using content, time and link structures. IEEE International Conference on Multimedia and Expo 2007: 2030--2033."},{"key":"e_1_2_1_21_1","volume-title":"Poceedings of the 15th Text REtrieval Conference (TREC'06)","author":"Lin Y.-R.","unstructured":"Lin , Y.-R. , Chen , W.-Y. , Shi , X. , Sia , R. , Song , X. , Chi , Y. , Hino , K. , Sundaram , H. , Tatemura , J. , and Tseng , B . 2006. The splog detection task and a solution based on temporal and link properties . In Poceedings of the 15th Text REtrieval Conference (TREC'06) . Lin, Y.-R., Chen, W.-Y., Shi, X., Sia, R., Song, X., Chi, Y., Hino, K., Sundaram, H., Tatemura, J., and Tseng, B. 2006. The splog detection task and a solution based on temporal and link properties. In Poceedings of the 15th Text REtrieval Conference (TREC'06)."},{"key":"e_1_2_1_22_1","unstructured":"Macdonald C. and Ounis I. 2006. The trec blogs06 collection: Creating and analyzing a blog test collection. TR-2006-224. Department of Computer Science University of Glasgow.  Macdonald C. and Ounis I. 2006. The trec blogs06 collection: Creating and analyzing a blog test collection. TR-2006-224. Department of Computer Science University of Glasgow."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb).","author":"Mishne G.","unstructured":"Mishne , G. , Carmel , D. , and Lempel , R . 2005. Blocking blog spam with language model disagreement . In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb). Mishne, G., Carmel, D., and Lempel, R. 2005. Blocking blog spam with language model disagreement. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb)."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem.","author":"Narisawa K.","unstructured":"Narisawa , K. , Yamada , Y. , Ikeda , D. , and Takeda , M . 2006. Detecting blog spams using the vocabulary size of all substrings in their copies . In Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem. Narisawa, K., Yamada, Y., Ikeda, D., and Takeda, M. 2006. Detecting blog spams using the vocabulary size of all substrings in their copies. In Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.69.026113"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135794"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the Human Language Technology Conference of the NAACL. Companion Volume: Short Papers, 137--140","author":"Salvetti F.","unstructured":"Salvetti , F. and Nicolov , N . Weblog classification for fast splog filtering: A url language model segmentation approach . In Proceedings of the Human Language Technology Conference of the NAACL. Companion Volume: Short Papers, 137--140 . Salvetti, F. and Nicolov, N. Weblog classification for fast splog filtering: A url language model segmentation approach. In Proceedings of the Human Language Technology Conference of the NAACL. Companion Volume: Short Papers, 137--140."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.51"},{"key":"e_1_2_1_29_1","unstructured":"SURBL Surbl---spam uri realtime blocklists. http:\/\/www.surbl.org\/.  SURBL Surbl---spam uri realtime blocklists. http:\/\/www.surbl.org\/."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00130487"},{"key":"e_1_2_1_31_1","unstructured":"UMBRIA. 2006. Spam in the blogosphere. http:\/\/www.umbrialistens.com\/files\/uploads\/umbria_ splog.pdf.  UMBRIA. 2006. Spam in the blogosphere. http:\/\/www.umbrialistens.com\/files\/uploads\/umbria_ splog.pdf."},{"key":"e_1_2_1_32_1","unstructured":"Urvoy T. Lavergne T. and Filoche P. 2006. Tracking web spam with hidden style similarity. AIRWEB Seattle WA.  Urvoy T. Lavergne T. and Filoche P. 2006. Tracking web spam with hidden style similarity. AIRWEB Seattle WA."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/966389.966390"},{"key":"e_1_2_1_34_1","unstructured":"Wikipedia. http:\/\/en.wikipedia.org\/wiki\/.  Wikipedia. http:\/\/en.wikipedia.org\/wiki\/."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1062745.1062762"},{"key":"e_1_2_1_36_1","unstructured":"Zawodny J. 2005 Yahoo&excl; Search blog: A defense against comment spam. http:\/\/www.ysearchblog.com\/archives\/000069.html.  Zawodny J. 2005 Yahoo&excl; Search blog: A defense against comment spam. http:\/\/www.ysearchblog.com\/archives\/000069.html."}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1326561.1326565","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1326561.1326565","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:56:25Z","timestamp":1750254985000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1326561.1326565"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,2]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,2]]}},"alternative-id":["10.1145\/1326561.1326565"],"URL":"https:\/\/doi.org\/10.1145\/1326561.1326565","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"value":"1559-1131","type":"print"},{"value":"1559-114X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,2]]},"assertion":[{"value":"2007-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2007-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-03-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}