{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T06:11:13Z","timestamp":1775283073129,"version":"3.50.1"},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2006,12,1]],"date-time":"2006-12-01T00:00:00Z","timestamp":1164931200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGIR Forum"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:p>We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.<\/jats:p>","DOI":"10.1145\/1189702.1189703","type":"journal-article","created":{"date-parts":[[2007,1,17]],"date-time":"2007-01-17T18:32:02Z","timestamp":1169058722000},"page":"11-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":132,"title":["A reference collection for web spam"],"prefix":"10.1145","volume":"40","author":[{"given":"Carlos","family":"Castillo","sequence":"first","affiliation":[{"name":"Universit\u00e0 di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain"}]},{"given":"Debora","family":"Donato","sequence":"additional","affiliation":[{"name":"Universit\u00e0 di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain"}]},{"given":"Luca","family":"Becchetti","sequence":"additional","affiliation":[{"name":"Universit\u00e0 di Roma, Rome, Italy"}]},{"given":"Paolo","family":"Boldi","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi, Milan, Italy"}]},{"given":"Stefano","family":"Leonardi","sequence":"additional","affiliation":[{"name":"Universit\u00e0 di Roma, Rome, Italy"}]},{"given":"Massimo","family":"Santini","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi, Milan, Italy"}]},{"given":"Sebastiano","family":"Vigna","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi, Milan, Italy"}]}],"member":"320","published-online":{"date-parts":[[2006,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD)","author":"Becchetti","year":"2006","unstructured":"{ Becchetti et al. , 2006 } Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and Baeza-Yates, R. (2006). Using rank propagation and probabilistic counting for link-based spam detection . In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD) , Pennsylvania, USA. ACM Press. {Becchetti et al., 2006} Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and Baeza-Yates, R. (2006). Using rank propagation and probabilistic counting for link-based spam detection. In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD), Pennsylvania, USA. ACM Press."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135954"},{"key":"e_1_2_1_3_1","unstructured":"{Bencz\u00far et al. 2006b} Bencz\u00far A. A. Csalog\u00e1ny K. and Sarl\u00f3s T. (2006b). Link-based similarity search to fight web spam. In Adversarial Information Retrieval on the Web (AIRWEB) Seattle Washington USA.  {Bencz\u00far et al. 2006b} Bencz\u00far A. A. Csalog\u00e1ny K. and Sarl\u00f3s T. (2006b). Link-based similarity search to fight web spam. In Adversarial Information Retrieval on the Web (AIRWEB) Seattle Washington USA."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web","author":"Bencz\u00far","year":"2005","unstructured":"{ Bencz\u00far et al. , 2005 } Bencz\u00far, A. A., Csalog\u00e1ny, K., Sarl\u00f3s, T., and Uher, M. (2005). Spamrank: fully automatic link spam detection . In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web , Chiba, Japan. {Bencz\u00far et al., 2005} Bencz\u00far, A. A., Csalog\u00e1ny, K., Sarl\u00f3s, T., and Uher, M. (2005). Spamrank: fully automatic link spam detection. In Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web, Chiba, Japan."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.587"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988752"},{"key":"e_1_2_1_7_1","first-page":"37","article-title":"} Cohen, J. (1960). A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","unstructured":"{ Cohen , 1960 } Cohen, J. (1960). A coefficient of agreement for nominal scales . Psychological Bulletin , 20 : 37 -- 46 . {Cohen, 1960} Cohen, J. (1960). A coefficient of agreement for nominal scales. Psychological Bulletin, 20:37--46.","journal-title":"Psychological Bulletin"},{"key":"e_1_2_1_8_1","first-page":"23","volume-title":"Aaai-2000 Workshop On Artificial Intelligence For Web Search","author":"Davison","year":"2000","unstructured":"{ Davison , 2000 } Davison, B. D. (2000). Recognizing nepotistic links on the web . In Aaai-2000 Workshop On Artificial Intelligence For Web Search , pages 23 -- 28 , Austin, Texas. Aaai Press. {Davison, 2000} Davison, B. D. (2000). Recognizing nepotistic links on the web. In Aaai-2000 Workshop On Artificial Intelligence For Web Search, pages 23--28, Austin, Texas. Aaai Press."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988714"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0031619"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1047671.1047715"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the Twenty-Second Annual Conference of SAS Users Group","author":"Green","year":"1997","unstructured":"{ Green , 1997 } Green, A. M. (1997). Kappa statistics for multiple raters using categorical classifications . In Proceedings of the Twenty-Second Annual Conference of SAS Users Group , San Diego, USA. {Green, 1997} Green, A. M. (1997). Kappa statistics for multiple raters using categorical classifications. In Proceedings of the Twenty-Second Annual Conference of SAS Users Group, San Diego, USA."},{"key":"e_1_2_1_13_1","volume-title":"First International Workshop on Adversarial Information Retrieval on the Web.","author":"Gy\u00f6ngyi","year":"2005","unstructured":"{ Gy\u00f6ngyi and Garcia-Molina , 2005 } Gy\u00f6ngyi, Z. and Garcia-Molina, H. (2005). Web spam taxonomy . In First International Workshop on Adversarial Information Retrieval on the Web. {Gy\u00f6ngyi and Garcia-Molina, 2005} Gy\u00f6ngyi, Z. and Garcia-Molina, H. (2005). Web spam taxonomy. In First International Workshop on Adversarial Information Retrieval on the Web."},{"key":"e_1_2_1_14_1","first-page":"576","volume-title":"Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB)","author":"Gy\u00f6ngyi","year":"2004","unstructured":"{ Gy\u00f6ngyi et al. , 2004 } Gy\u00f6ngyi, Z., Molina, H. G., and Pedersen, J. (2004). Combating web spam with trustrank . In Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB) , pages 576 -- 587 , Toronto, Canada. Morgan Kaufmann. {Gy\u00f6ngyi et al., 2004} Gy\u00f6ngyi, Z., Molina, H. G., and Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB), pages 576--587, Toronto, Canada. Morgan Kaufmann."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135794"},{"key":"e_1_2_1_16_1","unstructured":"{Page et al. 1998} Page L. Brin S. Motwani R. and Winograd T. (1998). The PageRank citation ranking: bringing order to the Web. Technical report Stanford Digital Library Technologies Project.  {Page et al. 1998} Page L. Brin S. Motwani R. and Winograd T. (1998). The PageRank citation ranking: bringing order to the Web. Technical report Stanford Digital Library Technologies Project."},{"key":"e_1_2_1_17_1","volume-title":"The classification of search engine spam. Available online at http:\/\/www.silverdisc.co.uk\/articles\/spam-classification\/","author":"Perkins","year":"2001","unstructured":"{ Perkins , 2001} Perkins, A. ( 2001 ). The classification of search engine spam. Available online at http:\/\/www.silverdisc.co.uk\/articles\/spam-classification\/ . {Perkins, 2001} Perkins, A. (2001). The classification of search engine spam. Available online at http:\/\/www.silverdisc.co.uk\/articles\/spam-classification\/."}],"container-title":["ACM SIGIR Forum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1189702.1189703","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1189702.1189703","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T15:06:22Z","timestamp":1750259182000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1189702.1189703"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,12]]},"references-count":17,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["10.1145\/1189702.1189703"],"URL":"https:\/\/doi.org\/10.1145\/1189702.1189703","relation":{},"ISSN":["0163-5840"],"issn-type":[{"value":"0163-5840","type":"print"}],"subject":[],"published":{"date-parts":[[2006,12]]},"assertion":[{"value":"2006-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}