{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T06:11:16Z","timestamp":1775283076626,"version":"3.50.1"},"reference-count":206,"publisher":"Emerald","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,6,23]]},"abstract":"<jats:p>Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than \"I know it when I see it.\" Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough? Which are better? How might they be improved? Will their effectiveness be compromised by more cleverly crafted spam?<\/jats:p>\n                  <jats:p>We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media \u2014 such as instant messaging and the Web \u2014 are addressed peripherally. In doing so we examine the definition of spam, the user\u2019s information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.<\/jats:p>","DOI":"10.1561\/1500000006","type":"journal-article","created":{"date-parts":[[2008,7,4]],"date-time":"2008-07-04T16:19:20Z","timestamp":1215188360000},"page":"335-455","source":"Crossref","is-referenced-by-count":194,"title":["Email Spam Filtering: A Systematic Review"],"prefix":"10.1561","volume":"1","author":[{"given":"Gordon V.","family":"Cormack","sequence":"first","affiliation":[{"name":"David R. Cheriton School of Computer Science, University of Waterloo,Waterloo , Ontario, N2L 3G1,","place":["Canada"]}]}],"member":"140","published-online":{"date-parts":[[2008,6,23]]},"reference":[{"key":"2026040314322155300_ref001","unstructured":"You might be an anti-spam kook if...\n          \n          Rhyolite\n          http:\/\/www.rhyolite.com\/antispam\/you-might-be.html"},{"key":"2026040314322155300_ref002","unstructured":"2004 National Technology Readiness Survey: Summary report\n          \n          2005\n          http:\/\/www.smith.umd.edu\/ntrs\/NTRS 2004.pdf"},{"issue":"1","key":"2026040314322155300_ref003","article-title":"The use of overall accuracy to evaluate the validity of screening or diagnostic tests","volume":"19","author":"Alberg","year":"2004","journal-title":"Journal of General Internal Medicine"},{"key":"2026040314322155300_ref004","article-title":"An evaluation of Naive Bayesian anti-spam filtering","volume":"cs.CL\/0006013","author":"Androutsopoulos","year":"2000","journal-title":"CoRR"},{"key":"2026040314322155300_ref005","article-title":"A game theoretic model of spam e-mailing","author":"Androutsopoulos","year":"2005","journal-title":"CEAS 2005 \u2014 The Second Conference on Email and Anti-Spam"},{"key":"2026040314322155300_ref006","first-page":"1","article-title":"Learning to filter spam E-mail: A comparison of a naive bayesian and a memory-based approach","author":"Androutsopoulos","year":"2000","journal-title":"Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000)"},{"key":"2026040314322155300_ref007","author":"Androutsopoulos","year":"2004"},{"key":"2026040314322155300_ref008","doi-asserted-by":"crossref","DOI":"10.1109\/ICDAR.2005.135","article-title":"Image analysis for efficient categorization of image-based spam e-mail","author":"Aradhye","year":"2005","journal-title":"Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR\u201905)"},{"key":"2026040314322155300_ref009","unstructured":"Assis\n              F.\n            \n          \n          OSBF-Lua\n          http:\/\/osbf-lua.luaforge.net\/"},{"key":"2026040314322155300_ref010","volume-title":"Fifteenth Text REtrieval Conference (TREC-2006)","author":"Assis","year":"2006"},{"key":"2026040314322155300_ref011","unstructured":"Berg\n              A.\n            \n          \n          Creating an antispam cocktail: Best spam detection and filtering techniques\n          http:\/\/searchsecurity.techtarget.com\/tip\/1,289483,sid14gci1116643,00.html\n          2005"},{"key":"2026040314322155300_ref012","doi-asserted-by":"crossref","DOI":"10.1145\/1273496.1273507","article-title":"Discriminative learning for differing training and test distributions","author":"Bickel","year":"2007","journal-title":"International Conference on Machine Learning (ICML)"},{"key":"2026040314322155300_ref013","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/7503.003.0025","article-title":"Dirichlet-Enhanced spam filtering based on biased samples","author":"Bickel","year":"2007","journal-title":"Neural Information Processing Systems (NIPS)"},{"key":"2026040314322155300_ref014","article-title":"Image spam filtering by content obscuring detection","author":"Biggio","year":"2007","journal-title":"CEAS 2007 \u2014 The Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref015","unstructured":"Blacklists compared\n          \n          http:\/\/www.sdsc.edu\/jeff\/spam\/BlacklistsCompared.html"},{"key":"2026040314322155300_ref0016","first-page":"2673","article-title":"Spam filtering using statistical data compression models","volume":"7","author":"Bratko","journal-title":"Journal of Machine Learning Research"},{"key":"2026040314322155300_ref017","article-title":"Spam filtering using character-level markov models: Experiments for the TREC 2005 Spam Track","author":"Bratko","year":"2005","journal-title":"Proceedings of 14th Text REtrieval Journal (TREC 2005)"},{"issue":"3","key":"2026040314322155300_ref018","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1016\/j.ipm.2005.06.003","article-title":"Exploiting structural information for semi-structured document categorization","volume":"42","author":"Bratko","year":"2006","journal-title":"Information Processing and Management"},{"issue":"2","key":"2026040314322155300_ref019","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1023\/A:1018054314350","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Machine Learning"},{"key":"2026040314322155300_ref020","article-title":"Highly scalable discriminative spam filtering","author":"Bruckner","journal-title":"Proceedings of 15th Text REtrieval Journal (TREC 2006)"},{"key":"2026040314322155300_ref021","unstructured":"Burton\n              B.\n            \n          \n          SpamProbe \u2014 A Fast Bayesian Spam Filter\n          2002\n          http:\/\/spamprobe.sourceforge.net"},{"key":"2026040314322155300_ref022","article-title":"A discriminative classifier learning approach to image modeling and spam image identification","author":"Byun","year":"2007","journal-title":"CEAS 2007 \u2014 The Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref023","unstructured":"CAPTCHA: Telling humans and computers apart automatically\n          \n          http:\/\/www.captcha.net\/"},{"key":"2026040314322155300_ref024","article-title":"Boosting trees for anti-spam email filtering","author":"Carreras","year":"2001","journal-title":"Proceedings of RANLP-2001, 4th International Journal on Recent Advances in Natural Language Processing"},{"key":"2026040314322155300_ref025","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1145\/1054972.1055070","article-title":"Designing human friendly human interactive proofs (HIPS)","author":"Chellapilla","year":"2005","journal-title":"CHI \u201905: SIGCHI Journal on Human Factors in Computing Systems"},{"key":"2026040314322155300_ref026","article-title":"Computers beat humans at single character recognition in reading-based human interaction proofs","author":"Chellapilla","year":"2005","journal-title":"CEAS 2005 \u2014 The Second Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref027","author":"Chhabra","year":"2005"},{"key":"2026040314322155300_ref028","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.patrec.2007.07.018","article-title":"Time-efficient spam e-mail filtering using n-gram models","volume":"29","author":"Ciltik","year":"2008","journal-title":"Pattern Recognition Letters"},{"key":"2026040314322155300_ref029","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1109\/TCOM.1984.1096090","article-title":"Data compression using adaptive coding and partial string matching","volume":"32","author":"Cleary","journal-title":"IEEE Transactions on Communications"},{"key":"2026040314322155300_ref030","first-page":"115","article-title":"Fast effective rule induction","author":"Cohen","journal-title":"Proceedings of the 12th International Journal on Machine Learning"},{"key":"2026040314322155300_ref031","article-title":"Feature engineering for mobile (SMS) spam filtering","author":"Cormack","year":"2007","journal-title":"30th ACM SIGIR Journal on Research and Development on Information Retrieval"},{"key":"2026040314322155300_ref032","article-title":"Harnessing unlabeled examples through iterative application of Dynamic Markov Modeling","author":"Cormack","year":"2006","journal-title":"Proceedings of the ECML\/PKDD Discovery Challenge Workshop"},{"key":"2026040314322155300_ref033","doi-asserted-by":"crossref","DOI":"10.6028\/NIST.SP.500-272.spam-overview","article-title":"TREC 2006 Spam Track Overview","author":"Cormack","year":"2006","journal-title":"Fifteenth Text REtrieval Journal (TREC-2006)"},{"key":"2026040314322155300_ref034","article-title":"TREC 2007 Spam Track Overview","author":"Cormack","year":"2007","journal-title":"Sixteenth Text REtrieval Journal (TREC-2007)"},{"key":"2026040314322155300_ref035","article-title":"University of Waterloo participation in the TREC 2007 spam track","author":"Cormack","year":"2007","journal-title":"Sixteenth Text REtrieval Journal (TREC-2007)"},{"key":"2026040314322155300_ref036","article-title":"Batch and on-line spam filter evaluation","author":"Cormack","year":"2006","journal-title":"CEAS 2006: The Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref037","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1145\/1321440.1321486","article-title":"Spam filtering for short messages","author":"Cormack","year":"2007","journal-title":"CIKM \u201907: Proceedings of the Sixteenth ACM Journal on Information and Knowledge Management"},{"issue":"6","key":"2026040314322155300_ref038","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1093\/comjnl\/30.6.541","article-title":"Data compression using dynamic Markov modelling","volume":"30","author":"Cormack","year":"1987","journal-title":"The Computer Journal"},{"key":"2026040314322155300_ref039","unstructured":"Cormack\n              G. V.\n            \n            \n              Lynam\n              T. R.\n            \n          \n          TREC Spam Filter Evaluation Toolkit\n          http:\/\/plg.uwaterloo.ca\/~gvcormac\/jig\/"},{"key":"2026040314322155300_ref040","doi-asserted-by":"crossref","unstructured":"Cormack\n              G. V.\n            \n            \n              Lynam\n              T. R.\n            \n          \n          TREC 2005 Spam Track Overview\n          http:\/\/plg.uwaterloo.ca\/~gvcormac\/trecspamtrack05\n          2005","DOI":"10.6028\/NIST.SP.500-266.spam-overview"},{"key":"2026040314322155300_ref041","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1145\/1148170.1148262","article-title":"Statistical precision of information retrieval evaluation","author":"Cormack","year":"2006","journal-title":"SIGIR \u201906: Proceedings of the 29th Annual International ACM SIGIR Journal on Research and Development in Information Retrieval"},{"issue":"3","key":"2026040314322155300_ref042","doi-asserted-by":"crossref","DOI":"10.1145\/1247715.1247717","article-title":"On-line supervised spam filter evaluation","volume":"25","author":"Cormack","year":"2007","journal-title":"ACM Transactions on Information Systems"},{"issue":"4","key":"2026040314322155300_ref043","article-title":"Challenges in Anti-spam Efforts","volume":"8","author":"Crocker","year":"2006","journal-title":"The Internet Protocol Journal"},{"key":"2026040314322155300_ref044","first-page":"99","article-title":"Adversarial classification","author":"Dalvi","year":"2004","journal-title":"KDD"},{"key":"2026040314322155300_ref045","first-page":"5","article-title":"Detecting spam in VoIP networks","author":"Dantu","year":"2005","journal-title":"SRUTI\u201905: Steps to Reducing Unwanted Traffic on the Internet Workshop"},{"key":"2026040314322155300_ref046","first-page":"15","article-title":"The mechanics of Vipul\u2019s Razor","author":"Deguerre","journal-title":"Network Security"},{"issue":"3\u20134","key":"2026040314322155300_ref047","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1007\/s10462-005-9006-6","article-title":"Case-based reasoning for spam filtering","volume":"24","author":"Delany","year":"2005","journal-title":"Artificial Intelligence Review"},{"key":"2026040314322155300_ref048","author":"Dietterich","year":"1996"},{"key":"2026040314322155300_ref049","article-title":"A game-theoretic investigation of the effect of human interactive proofs on spam e-mail","author":"Dimitrios","year":"2007","journal-title":"CEAS 2007 \u2014 The Fourth Journal on Email and Anti-Spam"},{"issue":"2","key":"2026040314322155300_ref050","doi-asserted-by":"crossref","DOI":"10.1145\/1144403.1144407","article-title":"Peer-to-peer Collaborative Spam Detection","volume":"11","author":"Dimmock","year":"2004","journal-title":"ACM Crossroads"},{"issue":"2\u20133","key":"2026040314322155300_ref051","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1023\/A:1007413511361","article-title":"On the optimality of the simple Bayesian classifier under zero-one loss","volume":"29","author":"Domingos","year":"1997","journal-title":"Machine Learning"},{"key":"2026040314322155300_ref052","article-title":"Learning fast classifiers for image spam","author":"Dredze","year":"2007","journal-title":"CEAS 2007 \u2014 The Third Journal on Email and Anti-Spam"},{"issue":"5","key":"2026040314322155300_ref053","first-page":"1048","article-title":"Support vector machines for spam categorization","volume":"10","author":"Drucker","year":"1999","journal-title":"IEEE-NN"},{"issue":"5","key":"2026040314322155300_ref054","doi-asserted-by":"crossref","first-page":"1048","DOI":"10.1109\/72.788645","article-title":"Support vector machines for spam categorization","volume":"10","author":"Drucker","year":"1999","journal-title":"IEEE Transactions on Neural Networks"},{"key":"2026040314322155300_ref055","article-title":"Pricing via processing or combatting junk mail","author":"Dwork","year":"1992","journal-title":"CRYPTO \u201992"},{"key":"2026040314322155300_ref056","unstructured":"ECML\/PKDD Discovery Challenge\n          \n          http:\/\/www.ecmlpkdd2006.org\/challenge.htm\n          2006"},{"issue":"2","key":"2026040314322155300_ref057","article-title":"\u2018In vivo\u2019 spam filtering: A challenge problem for data mining","volume":"5","author":"Fawcett","journal-title":"KDD Explorations"},{"key":"2026040314322155300_ref058","unstructured":"Fawcett\n              T.\n            \n          \n          ROC Graphs: Notes and Practical Considerations for Researchers\n          http:\/\/home.comcast.net\/~tom.fawcett\/public_html\/papers\/ROC101.pdf\n          2004"},{"key":"2026040314322155300_ref059","unstructured":"Ferris\n              D.\n            \n            \n              Jennings\n              R.\n            \n          \n          Calculating the Cost of Spam for Your Organization\n          http:\/\/www.ferris.com\/?p=310061\n          2005"},{"key":"2026040314322155300_ref060","unstructured":"Ferris\n              D.\n            \n            \n              Jennings\n              R.\n            \n            \n              Williams\n              C.\n            \n          \n          The Global Economic Impact of Spam\n          http:\/\/www.ferris.com\/?p=309942\n          2005"},{"key":"2026040314322155300_ref061","unstructured":"Final ultimate solution to the spam problem\n          \n          http:\/\/craphound.com\/spamsolutions.txt"},{"key":"2026040314322155300_ref062","first-page":"2699","article-title":"Spam filtering based on the analysis of text information embedded into images","volume":"7","author":"Fumera","year":"2006","journal-title":"Journal of Machine Learning Research (special issue on Machine Learning in Computer Security)"},{"key":"2026040314322155300_ref063","article-title":"Spam filtering based on latent semantic indexing","author":"Gansterer","year":"2007","journal-title":"SIAM Journal on Data Mining"},{"key":"2026040314322155300_ref064","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1145\/952532.952623","article-title":"Using latent semantic indexing to filter spam","author":"Gee","year":"2003","journal-title":"SAC \u201903: Proceedings of the 2003 ACM Symposium on Applied Computing"},{"key":"2026040314322155300_ref065","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1007\/978-3-540-28650-9_5","volume-title":"Advanced Lectures in Machine Learning, Lecture Notes in Computer Science","author":"Ghahramani","year":"2004"},{"issue":"11","key":"2026040314322155300_ref066","doi-asserted-by":"crossref","first-page":"1129","DOI":"10.1016\/S0895-4356(03)00177-X","article-title":"The diagnostic odds ratio: A single indicator of test performance","volume":"56","author":"Glas","year":"2003","journal-title":"Journal of Clinical Epidemiology"},{"key":"2026040314322155300_ref067","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1038\/scientificamerican0405-42","article-title":"Stopping spam","volume":"292","author":"Goodman","journal-title":"Scientific American"},{"key":"2026040314322155300_ref068","article-title":"Online discriminative spam filter training","author":"Goodman","year":"2006","journal-title":"The Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref069","unstructured":"Graham\n              P.\n            \n          \n          Better Bayesian Filtering\n          http:\/\/www.paulgraham.com\/better.html\n          2004"},{"key":"2026040314322155300_ref070","article-title":"How to beat an adaptive spam filter","author":"Graham-Cumming","year":"2004","journal-title":"The Spam Journal"},{"key":"2026040314322155300_ref071","article-title":"People and spam","author":"Graham-Cumming","year":"2005","journal-title":"The Spam Journal"},{"key":"2026040314322155300_ref072","article-title":"Does Bayesian poisoning exist?","author":"Graham-Cumming","journal-title":"Virus Bulletin"},{"key":"2026040314322155300_ref073","article-title":"SpamOrHam","author":"Graham-Cumming","journal-title":"Virus Bulletin"},{"key":"2026040314322155300_ref074","article-title":"The rise and fall of image-based spam","author":"Graham-Cumming","journal-title":"Virus Bulletin"},{"key":"2026040314322155300_ref075","article-title":"The spammer\u2019s compendium: Five years on","author":"Graham-Cumming","journal-title":"Virus Bulletin"},{"key":"2026040314322155300_ref076","doi-asserted-by":"crossref","unstructured":"Graham-Cumming\n              J.\n            \n          \n          Why I hate challenge-response\n          JGC\u2019s Anti-Spam Newsletter\n          February 28, 2005","DOI":"10.1353\/cal.2005.0047"},{"key":"2026040314322155300_ref077","unstructured":"Greylisting: The next step in the spam control war\n          \n          http:\/\/projects.puremagic.com\/greylisting\/\n          2003"},{"key":"2026040314322155300_ref078","unstructured":"Guenter\n              B.\n            \n          \n          Spam Archive\n          http:\/\/www.untroubled.org\/spam\/"},{"key":"2026040314322155300_ref079","unstructured":"Gupta\n              K.\n            \n            \n              Chaudhary\n              V.\n            \n            \n              Marwah\n              N.\n            \n            \n              Taneja\n              C.\n            \n          \n          ECML-PKDD Discovery Challenge Entry\n          Inductis India Pvt Ltd\n          2006"},{"key":"2026040314322155300_ref080","article-title":"Using positive-only learning to deal with the heterogeneity of labeled and unlabeled data","author":"Gupta","year":"2006","journal-title":"ECML\/PKDD Discovery Challenge Workshop"},{"key":"2026040314322155300_ref081","doi-asserted-by":"crossref","DOI":"10.1511\/2007.66.298","article-title":"How many ways can you spell Viagra?","volume":"95","author":"Hayes","year":"2007","journal-title":"American Scientist"},{"key":"2026040314322155300_ref082","first-page":"615","article-title":"Evaluating cost-sensitive unsolicited bulk email categorization","volume-title":"Proceedings of SAC-02, 17th ACM Symposium on Applied Computing","author":"Hidalgo","year":"2002"},{"key":"2026040314322155300_ref083","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1145\/508791.508911","article-title":"Evaluating cost-sensitive unsolicited bulk email categorization","author":"Hidalgo","year":"2002","journal-title":"SAC \u201902: Proceedings of the ACM Symposium on Applied Computing"},{"key":"2026040314322155300_ref084","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1166160.1166191","article-title":"Content based SMS spam filtering","author":"Hidalgo","year":"2006","journal-title":"DocEng \u201906: Proceedings of the 2006 ACM Symposium on Document Engineering"},{"key":"2026040314322155300_ref085","first-page":"99","article-title":"Combining text and heuristics for cost-sensitive spam filtering","author":"Hidalgo","year":"2000","journal-title":"2nd Workshop on Learning Language in Logic and 4th Journal on Computational Natural Language Learning"},{"key":"2026040314322155300_ref086","first-page":"68","article-title":"Spam Filtering II","author":"Holden","journal-title":"Hakin9"},{"key":"2026040314322155300_ref087","author":"Hosmer","year":"2000"},{"key":"2026040314322155300_ref088","article-title":"Naive bayes spam filtering using word-position-based attributes","author":"Hovold","year":"2005","journal-title":"2nd Journal on Email and Anti-Spam (CEAS 2005)"},{"key":"2026040314322155300_ref089","unstructured":"Ilger\n              M.\n            \n            \n              Strauss\n              J.\n            \n            \n              Gansterer\n              W.\n            \n            \n              Proschinger\n              C.\n            \n          \n          The Economy of Spam\n          Instituted of Distributed and Multimedia Systems, University of Vienna\n          FA384018-6\n          2006"},{"key":"2026040314322155300_ref090","first-page":"143","article-title":"A probabilistic analysis of the Rocchio algorithm with TF-IDF for text categorization","author":"Joachims","year":"1997","journal-title":"ICML-97, 14th International Journal on Machine Learning"},{"key":"2026040314322155300_ref091","author":"Joachims","year":"1999"},{"key":"2026040314322155300_ref092","first-page":"338","article-title":"Estimating continuous distributions in Bayesian classifiers","author":"John","year":"1995","journal-title":"Eleventh Journal on Uncertainty in Artificial Intelligence"},{"key":"2026040314322155300_ref093","article-title":"A two-pass statistical approach for automatic personalized spam filtering","author":"Junejo","year":"2006","journal-title":"ECML\/PKDD Discovery Challenge Workshop"},{"key":"2026040314322155300_ref094","article-title":"Introducing the Enron corpus","author":"Klimt","year":"2004","journal-title":"CEAS 2004 \u2014 The Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref095","first-page":"1137","article-title":"A study of cross-validation and bootstrap for accuracy estimation and model selection","author":"Kohavi","year":"1995","journal-title":"IJCAI"},{"key":"2026040314322155300_ref096","article-title":"SVM-based filtering of E-mail spam with content specific misclassification costs","author":"Kolcz","year":"2001","journal-title":"TextDM 2001 (IEEE ICDM-2001 Workshop on Text Mining)"},{"key":"2026040314322155300_ref097","article-title":"Hardening fingerprints by context","author":"Kolcz","year":"2007","journal-title":"CEAS 2007 \u2014 The Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref098","article-title":"Lexicon randomization for near-duplicate detection with I-match","volume":"DOI 10.1007\/s11227-007-0171-z","author":"Kolcz","year":"2007","journal-title":"Journal of Supercomputing"},{"key":"2026040314322155300_ref099","article-title":"The impact of feature selection on signature-driven spam detection","author":"Kolcz","year":"2004","journal-title":"CEAS 2004 \u2014 The Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref100","article-title":"Fast robust logistic regression for large sparse datasets with binary outputs","author":"Komarek","year":"2003","journal-title":"Artificial Intelligence and Statistics"},{"key":"2026040314322155300_ref101","article-title":"Searching for John Doe: Finding spammers and phishers","author":"Kornblum","year":"2004","journal-title":"CEAS 2004 \u2014 The Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref102","first-page":"249","article-title":"Supervised learning: A review of classification techniques","volume":"31","author":"Kotsiantis","year":"2007","journal-title":"Informatica"},{"key":"2026040314322155300_ref103","article-title":"In the fight against spam E-mail, Goliath wins again","author":"Krebs","year":"17, 2006","journal-title":"Washington Post"},{"key":"2026040314322155300_ref104","article-title":"Spam deobfuscation using a hidden Markov model","author":"Lee","year":"2005","journal-title":"CEAS 2005 \u2014 The Second Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref105","first-page":"2523","article-title":"Dynamically weighted hidden Markov model for spam deobfuscation","author":"Lee","year":"2007","journal-title":"IJCAI 07"},{"key":"2026040314322155300_ref106","article-title":"Experiences with greylisting","author":"Levine","year":"2005","journal-title":"CEAS 2005: Second Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref107","first-page":"148","article-title":"Heterogeneous uncertainty sampling for supervised learning","author":"Lewis","year":"1994","journal-title":"ICML-94, 11th International Journal on Machine Learning"},{"key":"2026040314322155300_ref108","first-page":"298","article-title":"Training algorithms for linear text classifiers","author":"Lewis","year":"1996","journal-title":"SIGIR-96, 19th ACM International Journal on Research and Development in Information Retrieval"},{"key":"2026040314322155300_ref109","article-title":"Resisting SPAM delivery by TCP damping","author":"Li","year":"2004","journal-title":"CEAS 2004 \u2014 The Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref110","article-title":"DomainKeys identified email (DKIM): Using digital signatures for domain verification","author":"Lieba","year":"2007","journal-title":"CEAS 2007: The Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref111","article-title":"SMTP path analysis","author":"Lieba","year":"2005","journal-title":"2nd Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref112","unstructured":"Ling-Spam, PU and Enron Corpora\n          \n          http:\/\/www.iit.demokritos.gr\/skel\/iconfig\/downloads\/"},{"key":"2026040314322155300_ref113","article-title":"TREC Spam Filter Evaluation Toolkit","author":"Lynam"},{"key":"2026040314322155300_ref114","volume-title":"29th ACM SIGIR Journal on Research and Development on Information Retrieval","author":"Lynam","year":"2006"},{"key":"2026040314322155300_ref115","author":"Lyon","year":"2006"},{"key":"2026040314322155300_ref116","unstructured":"Mail abuse prevention system\n          \n          http:\/\/www.mail-abuse.com\/\n          2005"},{"key":"2026040314322155300_ref117","article-title":"For bulk E-mailer, pestering millions offers path to profit","author":"Mangalindan","year":"13, 2002","journal-title":"Wall Street Journal"},{"key":"2026040314322155300_ref118","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.21437\/Eurospeech.1997-504","article-title":"The DET curve in assessment of detection task performance","author":"Martin","year":"1997","journal-title":"Eurospeech \u201997"},{"key":"2026040314322155300_ref119","article-title":"US Court threatens Spamhaus with shut down","author":"McMillan","year":"2006","journal-title":"InfoWorld"},{"key":"2026040314322155300_ref120","article-title":"An adaptive, semi-structured language model approach to spam filtering on a new corpus","volume-title":"Proceedings of CEAS 2006 \u2014 Third Journal on Email and Anti-Spam","author":"Medlock","year":"2006"},{"key":"2026040314322155300_ref121","article-title":"Naive Bayes \u2014 Which Naive Bayes?","volume-title":"Proceedings of CEAS 2006 \u2014 Third Journal on Email and Anti-Spam","author":"Metsis","year":"2006"},{"key":"2026040314322155300_ref122","article-title":"Filtron: A learning-based anti-spam filter","volume-title":"Proceedings of the 1st Journal on Email and Anti-Spam (CEAS 2004)","author":"Michelakis","year":"2004"},{"key":"2026040314322155300_ref123","unstructured":"Mishne\n              G.\n            \n            \n              Carmel\n              D.\n            \n          \n          Blocking Blog Spam with Language Model Disagreement\n          2005"},{"key":"2026040314322155300_ref124","article-title":"Chunk-Kwei: A pattern discovery-based System for the automatic identification of unsolicited email messages (spam)","volume-title":"CEAS 2004 \u2014 The Journal on Email and Anti-Spam","author":"Moustakas","year":"2004"},{"key":"2026040314322155300_ref125","article-title":"WIM at TREC 2007","volume-title":"Sixteenth Text REtrieval Journal (TREC-2007)","author":"Niu","year":"2007"},{"key":"2026040314322155300_ref126","article-title":"Splitting the unsupervised and supervised components of semi-supervised learning","volume-title":"ICML 2005 LPCTD Workshop","author":"Oliveira","year":"2005"},{"key":"2026040314322155300_ref127","doi-asserted-by":"crossref","unstructured":"Pampapathi\n              R. M.\n            \n            \n              Mirkin\n              B.\n            \n            \n              Levene\n              M.\n            \n          \n          A suffix tree approach to email filtering\n          Birkbeck University of London\n          2005","DOI":"10.1007\/s10994-006-9505-y"},{"key":"2026040314322155300_ref128","first-page":"211","article-title":"Tree induction vs. logistic regression: A learning-curve analysis","volume":"4","author":"Perlich","year":"2003","journal-title":"Journal of Machine Learning and Research"},{"key":"2026040314322155300_ref129","volume-title":"Proceedings of ECML\/PKDD Discovery Challenge Workshop","author":"Pfahringer","year":"2006"},{"key":"2026040314322155300_ref130","unstructured":"Project Honeypot\n          \n          http:\/\/www.projecthoneypot.org\/"},{"key":"2026040314322155300_ref131","article-title":"Observed trends in spam construction techniques","volume-title":"Proceedings of CEAS 2006 \u2014 Third Journal on Email and Anti-Spam","author":"Pu","year":"2006"},{"key":"2026040314322155300_ref132","author":"Quinlan","year":"1993"},{"key":"2026040314322155300_ref133","article-title":"Can DNS-based blacklists keep up with bots?","volume-title":"CEAS 2006 \u2014 The Second Journal on Email and Anti-Spam","author":"Ramachandran","year":"2006"},{"key":"2026040314322155300_ref134","unstructured":"Raymond\n              E. S.\n            \n            \n              Relson\n              D.\n            \n            \n              Andree\n              M.\n            \n            \n              Louis\n              G.\n            \n          \n          BogoFilter\n          http:\/\/bogofilter.sourceforge.net\/\n          2004"},{"key":"2026040314322155300_ref135","unstructured":"Rideau\n              F. R.\n            \n          \n          Stamps vs. Spam: Postage as a Method to Eliminate Unsolicited Email\n          http:\/\/fare.tunes.org\/articles\/stamps vs spam.html\n          2002"},{"issue":"3","key":"2026040314322155300_ref136","article-title":"A statistical approach to the spam problem","volume":"107","author":"Robinson","year":"2003","journal-title":"Linux Journal"},{"issue":"15","key":"2026040314322155300_ref137","doi-asserted-by":"crossref","first-page":"2739","DOI":"10.1016\/j.comcom.2005.10.037","article-title":"An anti-spam scheme using prechallenges","volume":"29","author":"Roman","year":"2006","journal-title":"Computer Communications"},{"key":"2026040314322155300_ref138","unstructured":"Rossow\n              C.\n            \n          \n          Anti-Spam measures of European ISPs\/ESPs: A survey based analysis of state-of-the-art technologies, current spam trends and recommendations for future-oriented anti-spam concepts\n          Institute for Internet Security\n          2007\n          08"},{"key":"2026040314322155300_ref139","author":"Rothman","year":"1998"},{"key":"2026040314322155300_ref140","unstructured":"Rowland\n              R.\n            \n          \n          Spam, Spam, Spam: The Cyberspace Wars\n          CBC\n          http:\/\/www.cbc.ca\/news\/background\/spam\/\n          2004"},{"key":"2026040314322155300_ref141","article-title":"A Bayesian approach to filtering junk e-mail","volume-title":"Learning for Text Categorization: Papers from the 1998 Workshop","author":"Sahami","year":"1998"},{"key":"2026040314322155300_ref142","first-page":"44","article-title":"Stacking classifiers for anti-spam filtering of e-mail","volume-title":"Empirical Methods in Natural Language Processing (EMNLP 2001)","author":"Sakkis","year":"2001"},{"issue":"1","key":"2026040314322155300_ref143","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1023\/A:1022948414856","article-title":"A memory-based approach to anti-spam filtering for mailing lists","volume":"6","author":"Sakkis","year":"2003","journal-title":"Information Retrieval"},{"key":"2026040314322155300_ref144","first-page":"316","article-title":"Spam detection using text clustering","author":"Sasaki","year":"2005"},{"issue":"3","key":"2026040314322155300_ref145","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1023\/A:1007614523901","article-title":"Improved boosting using confidence-rated predictions","volume":"37","author":"Schapire","year":"1999","journal-title":"Machine Learning"},{"key":"2026040314322155300_ref146","article-title":"A comparison of event models for naive Bayes anti-spam email filtering","author":"Schneider","year":"2003"},{"key":"2026040314322155300_ref147","article-title":"Online active learning methods for fast label-efficient spam filtering","author":"Sculley","year":"2007"},{"key":"2026040314322155300_ref148","first-page":"332","article-title":"Compression and machine learning: A new perspective on feature space vectors","author":"Sculley","year":"2006"},{"key":"2026040314322155300_ref149","unstructured":"Sculley\n              D.\n            \n            \n              Cormack\n              G. V.\n            \n          \n          Filtering Spam in the Presence of Noisy User Feedback\n          Tufts University\n          2008"},{"key":"2026040314322155300_ref150","article-title":"Relaxed online support vector machines for spam filtering","author":"Sculley","year":"2007"},{"key":"2026040314322155300_ref151","article-title":"Relaxed online SVMs in the TREC Spam filtering track","author":"Sculley","year":"2007"},{"key":"2026040314322155300_ref152","article-title":"Spam classification with on-line linear classifiers and inexact string matching features","author":"Sculley","year":"2006","journal-title":"Proceedings of the 15th Text REtrieval Journal (TREC 2006)"},{"issue":"1","key":"2026040314322155300_ref153","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/505282.505283","article-title":"Machine learning in automated text categorization","volume":"34","author":"Sebastiani","year":"2002","journal-title":"ACM Computing Surveys"},{"key":"2026040314322155300_ref154","article-title":"SpamGuru: An enterprise anti-spam filtering system","author":"Segal","year":"2004","journal-title":"First Journal on Email and Anti-Spam (CEAS)"},{"key":"2026040314322155300_ref155","article-title":"Fast uncertainty sampling for labeling large e-mail corpora","author":"Segal","year":"2006","journal-title":"Proceedings of CEAS 2006 \u2014 Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref156","volume-title":"Nearest-Neighbor Methods in Learning and Vision","author":"Shakhnarovish","year":"2005"},{"key":"2026040314322155300_ref157","article-title":"Fighting spam with reputation systems","author":"Sharma","year":"2005","journal-title":"ACM Queue"},{"key":"2026040314322155300_ref158","first-page":"410","article-title":"Combining winnow and orthogonal sparse bigrams for incremental spam filtering","author":"Siefkes","year":"2004","journal-title":"PKDD \u201904: 8th European Journal on Principles and Practice of Knowledge Discovery in Databases"},{"key":"2026040314322155300_ref159","article-title":"Using character recognition and segmentation to tell computer from humans","author":"Simard","year":"2003","journal-title":"ICDAR \u201903: Seventh International Journal on Document Analysis and Recognition"},{"key":"2026040314322155300_ref160","article-title":"Spam in the wild, the sequel","author":"Snyder","year":"2004","journal-title":"Network World"},{"key":"2026040314322155300_ref161","article-title":"The effects of anti-spam methods on spam mail","author":"Solan","year":"2006","journal-title":"CEAS 2006 \u2014 Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref162","unstructured":"Spam testing methodology\n          \n          2007\n          http:\/\/www.opus1.com\/www\/whitepapers\/spamtestmethodology.pd"},{"key":"2026040314322155300_ref163","unstructured":"The Spamassassin Public Mail Corpus\n          \n          2003\n          http:\/\/spamassassin.apache.org\/publiccorpus"},{"key":"2026040314322155300_ref164","unstructured":"Welcome to SpamAssassin\n          \n          2005\n          http:\/\/spamassassin.apache.org"},{"key":"2026040314322155300_ref165","unstructured":"Spambase\n          \n          http:\/\/mlearn.ics.uci.edu\/databases\/spambase\/"},{"key":"2026040314322155300_ref166","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1002\/asi.4630200110","article-title":"Effectiveness of information retrieval systems","volume":"20","author":"Swets","year":"1969","journal-title":"American Documentation"},{"key":"2026040314322155300_ref167","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/ICDS.2008.15","article-title":"Spam mail reduces economic effects","author":"Takemura","year":"2008","journal-title":"Second International Journal on the Digital Society"},{"key":"2026040314322155300_ref168","article-title":"Improving spam filtering by detecting gray mail","author":"TauYih","year":"2007","journal-title":"Proceedings of CEAS 2007 \u2014 Fourth Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref169","first-page":"224","article-title":"Tuning the hyperparameter of an AUC optimized classifier","author":"Tax","year":"2005","journal-title":"Seventeenth Belgium-Netherlands Journal on Artificial Intelligence"},{"key":"2026040314322155300_ref170","unstructured":"The CEAS 2007 Live Spam Challenge\n          \n          2007\n          http:\/\/www.ceas.cc\/2007\/challenge\/challenge.html"},{"key":"2026040314322155300_ref171","unstructured":"The penny black project\n          \n          http:\/\/research.microsoft.com\/research\/sv\/PennyBlack\/"},{"key":"2026040314322155300_ref172","unstructured":"TREC 2005 Spam Corpus\n          \n          2005\n          http:\/\/plg.uwaterloo.ca\/~gvcormac\/treccorpus"},{"key":"2026040314322155300_ref173","unstructured":"TREC 2006 Spam Corpora\n          \n          2006\n          http:\/\/plg.uwaterloo.ca\/~gvcormac\/treccorpus"},{"key":"2026040314322155300_ref174","unstructured":"TREC 2007 Spam Corpus\n          \n          2007\n          http:\/\/plg.uwaterloo.ca\/~gvcormac\/treccorpus"},{"key":"2026040314322155300_ref175","article-title":"Machine learning techniques in spam filtering","author":"Tretyakov","year":"2004","journal-title":"Technical Report"},{"key":"2026040314322155300_ref176","volume-title":"Proceedings of ECML\/PKDD Discovery Challenge Workshop","author":"Trogkanis","year":"2006"},{"key":"2026040314322155300_ref177","unstructured":"Tschabitscher\n              H.\n            \n          \n          What you Need to Know about Challenge-Response Spam Filters\n          http:\/\/email.about.com\/cs\/spamgeneral\/a\/challenge_resp.htm"},{"key":"2026040314322155300_ref178","author":"Turner","year":"2007"},{"key":"2026040314322155300_ref179","volume-title":"Technical Report CS-2004-03","author":"Tuttle","year":"2004"},{"key":"2026040314322155300_ref180","volume-title":"Information Retrieval","author":"Van Rijsbergen","year":"1979","edition":"Second"},{"key":"2026040314322155300_ref181","unstructured":"Veritest Anti-Spam Benchmark Service Autumn 2005 Report\n          \n          2005\n          http:\/\/www.tumbleweed.com\/pdfs\/VeriTest_Anti-Spam_Report_Vol4_all_c.pdf"},{"key":"2026040314322155300_ref182","volume-title":"Fourteenth Text REtrieval Journal (TREC-2005)","author":"Voorhees","year":"2005"},{"key":"2026040314322155300_ref183","volume-title":"Fifteenth Text REtrieval Journal (TREC-2006)","author":"Voorhees","year":"2006"},{"key":"2026040314322155300_ref184","volume-title":"Sixteenth Text REtrieval Journal (TREC-2007)","author":"Voorhees","year":"2007"},{"key":"2026040314322155300_ref185","volume-title":"TREC \u2014 Experiment and Evaluation in Information Retrieval","author":"Voorhees","year":"2005"},{"key":"2026040314322155300_ref186","article-title":"Filtering image spam with near-duplicate detection","author":"Wang","year":"2007","journal-title":"CEAS 2007 \u2014 Third Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref187","author":"Web Spam Challenge","year":"2008"},{"key":"2026040314322155300_ref188","volume-title":"Proceedings of CEAS 2006 \u2014 Third Journal on Email and Anti-Spam","author":"Webb","year":"2006"},{"key":"2026040314322155300_ref189","unstructured":"West Coast Labs\n          \n          http:\/\/www.westcoastlabs.com"},{"issue":"3","key":"2026040314322155300_ref190","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1109\/18.382012","article-title":"The context-tree weighting method: Basic properties","volume":"41","author":"Willems","year":"1995","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026040314322155300_ref191","article-title":"On attacking statistical spam filters","author":"Wittel","year":"2004","journal-title":"CEAS 2004 \u2014 The Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref192","volume-title":"Weka: Practical Machine Learning Tools and Techniques with Java Implementations","author":"Witten","year":"1999"},{"key":"2026040314322155300_ref193","volume":"RFC 4408","author":"Wong","year":"2006","journal-title":"Sender Policy Framework (SPF) for Authorizing Use of Domains in E-mail"},{"key":"2026040314322155300_ref194","unstructured":"Yerazunis\n              W.\n            \n          \n          2002\n          Correspondence with Paul Graham\n          http:\/\/www.paulgraham.com\/wsy.html"},{"key":"2026040314322155300_ref195","unstructured":"Yerazunis\n              W. S.\n            \n          \n          2004\n          CRM114 \u2014 the Controllable Regex Mutilator\n          http:\/\/crm114.sourceforge.net\/"},{"key":"2026040314322155300_ref196","article-title":"The spam-filtering accuracy plateau at 99.9% accuracy and how to get past it","author":"Yerazunis","year":"2004","journal-title":"2004 MIT Spam Journal"},{"key":"2026040314322155300_ref197","volume-title":"Proceedings 15th Text REtrieval Journal (TREC 2006)","author":"Yerazunis","year":"2006"},{"key":"2026040314322155300_ref198","volume-title":"Sixteenth Text REtrieval Journal (TREC-2007)","author":"Yerazunis","year":"2007"},{"key":"2026040314322155300_ref199","article-title":"Learning at low false positive rates","author":"Yih","year":"2006","journal-title":"Proceedings of the 3rd Journal on Email and Anti-Spam"},{"key":"2026040314322155300_ref200","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1007\/s00500-006-0116-0","article-title":"Artificial immune system inspired behavior-based anti-spam filter","volume":"11","author":"Yue","year":"2007","journal-title":"Soft Computing"},{"issue":"4","key":"2026040314322155300_ref201","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1145\/1039621.1039625","article-title":"An evaluation of statistical spam filtering techniques","volume":"3","author":"Zhang","year":"2004","journal-title":"ACM Transactions on Asian Language Information Processing (TALIP)"},{"key":"2026040314322155300_ref202","doi-asserted-by":"crossref","DOI":"10.1109\/AMT.2005.1505383","article-title":"An email classification model based on rough set theory","author":"Zhao","year":"2005","journal-title":"Active Media Technology (AMT 2005)"},{"key":"2026040314322155300_ref203","article-title":"Learning with local and global consistency","author":"Zhou","year":"2003","journal-title":"18th Annual Journal on Neural Information Processing Systems"},{"key":"2026040314322155300_ref204","volume-title":"Semi-supervised Learning Literature Survey","author":"Zhu","year":"2007"},{"key":"2026040314322155300_ref205","volume-title":"Oral Presentation, ECML\/PKDD Discovery Challenge Workshop","author":"Zien","year":"2006"},{"key":"2026040314322155300_ref206","volume-title":"Proceedings of CEAS 2007 \u2014 Fourth Journal on Email and Anti-Spam","author":"Zinman","year":"2007"}],"container-title":["Foundations and Trends\u00ae in Information Retrieval"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftinr\/article-pdf\/1\/4\/335\/11024541\/1500000006en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftinr\/article-pdf\/1\/4\/335\/11024541\/1500000006en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T18:33:09Z","timestamp":1775241189000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftinr\/article\/1\/4\/335\/1326501\/Email-Spam-Filtering-A-Systematic-Review"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,6,23]]},"references-count":206,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2008,6,23]]}},"URL":"https:\/\/doi.org\/10.1561\/1500000006","relation":{},"ISSN":["1554-0669","1554-0677"],"issn-type":[{"value":"1554-0669","type":"print"},{"value":"1554-0677","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,6,23]]}}}