{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,1]],"date-time":"2025-05-01T22:21:47Z","timestamp":1746138107433},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2010,1,16]],"date-time":"2010-01-16T00:00:00Z","timestamp":1263600000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2011,3]]},"DOI":"10.1007\/s10579-009-9113-0","type":"journal-article","created":{"date-parts":[[2010,1,15]],"date-time":"2010-01-15T06:20:44Z","timestamp":1263536444000},"page":"25-43","source":"Crossref","is-referenced-by-count":3,"title":["Filtering artificial texts with statistical machine learning techniques"],"prefix":"10.1007","volume":"45","author":[{"given":"Thomas","family":"Lavergne","sequence":"first","affiliation":[]},{"given":"Tanguy","family":"Urvoy","sequence":"additional","affiliation":[]},{"given":"Fran\u00e7ois","family":"Yvon","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,1,16]]},"reference":[{"key":"9113_CR1","doi-asserted-by":"crossref","DOI":"10.1007\/978-94-010-0844-0","volume-title":"Word frequency distributions","author":"R. H. Baayen","year":"2001","unstructured":"Baayen, R. H. (2001). Word frequency distributions. Amsterdam, The Netherlands: Kluwer."},{"key":"9113_CR2","unstructured":"Brants, T., & Franz, A. (2006). Web 1T 5-gram corpus version 1.1. LDC ref: LDC2006T13."},{"key":"9113_CR3","unstructured":"Broder, A. Z., Glassman, S. C., Manasse, M. S., & Zweig, G. (1997). Syntactic clustering of the web. In Computer networks (Vol. 29, pp. 1157\u20131166). Amsterdam: Elsevier."},{"issue":"2","key":"9113_CR4","first-page":"79","volume":"16","author":"P. F. Brown","year":"1990","unstructured":"Brown, P. F., Cocke, J., Pietra, S. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer, R. L., & Roossin, P. S. (1990). A statistical approach to machine translation. Computational Linguistics, 16(2), 79\u201385.","journal-title":"Computational Linguistics"},{"key":"9113_CR5","unstructured":"Bulhak, A. C. (1996). The dada engine. http:\/\/dev.null.org\/dadaengine\/ ."},{"key":"9113_CR6","doi-asserted-by":"crossref","unstructured":"Chen, S. F., & Goodman, J. T. (1996). An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting of the association for computational linguistics (ACL) (pp. 310\u2013318). Santa Cruz.","DOI":"10.3115\/981863.981904"},{"key":"9113_CR7","doi-asserted-by":"crossref","DOI":"10.1007\/978-94-017-0171-6","volume-title":"Language modeling for information retrieval","author":"W. B. Croft","year":"2003","unstructured":"Croft, W. B., & Lafferty, J. (2003). Language modeling for information retrieval. Norwell, MA, USA: Kluwer."},{"key":"9113_CR8","unstructured":"Dalkilic, M. M., Clark, W. T., Costello, J. C., & Radivojac, P. (2006). Using compression to identify classes of inauthentic texts. In Proceedings of the SIAM international conference on data mining SDM 2006 (pp. 603\u2013607). Philadelphia, PA, USA: Society for Industrial and Applied Mathematics."},{"key":"9113_CR9","unstructured":"Dalvi, N., Domingos, P., Mausam, Sanghai, S., & Verma, D. (2004). Adversarial classification. In Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD\u201904) (pp. 99\u2013108). New York, NY, USA: ACM."},{"key":"9113_CR10","doi-asserted-by":"crossref","unstructured":"Fetterly, D., Manasse, M., & Najork, M. (2004). Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In Proceedings of WebDB\u201904 (pp. 1\u20136). New York, NY, USA.","DOI":"10.1145\/1017074.1017077"},{"key":"9113_CR11","unstructured":"Fetterly, D., Manasse, M., & Najork, M. (2005). Detecting phrase-level duplication on the world wide web. In SIGIR \u201905: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 170\u2013177). New York, NY, USA: ACM. doi: 10.1145\/1076034.107606 ."},{"key":"9113_CR12","unstructured":"Gray, A., Sallis, P., & MacDonell, S. (1997). Software forensics: Extending authorship analysis techniques to computer programs. In 3rd Biannual conference of international association of forensic linguists (IAFL \u201997) (pp. 1\u20138)."},{"key":"9113_CR13","unstructured":"Gyongyi, Z., & Garcia-Molina, H. (2005). Web spam taxonomy. In First international workshop on adversarial information retrieval on the web (AIRWeb 2005)."},{"key":"9113_CR14","unstructured":"Gy\u00f6ngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustRank. In Proceedings of the conference on very large databases (VLDB\u201904) (pp. 576\u2013587). Toronto, Canada: Morgan Kaufmann."},{"issue":"6","key":"9113_CR15","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1109\/MIC.2007.125","volume":"11","author":"P. Heymann","year":"2007","unstructured":"Heymann, P., Koutrika, G., & Garcia-Molina, H. (2007). Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Magazine on Internet Computing, 11(6), 36\u201345.","journal-title":"IEEE Magazine on Internet Computing"},{"issue":"2","key":"9113_CR16","first-page":"172","volume":"7","author":"A. Honor\u00e9","year":"1979","unstructured":"Honor\u00e9, A. (1979). Some simple measures of richness of vocabulary. Association for Literary and Linguistic Computing Bulletin, 7(2), 172\u2013177.","journal-title":"Association for Literary and Linguistic Computing Bulletin"},{"key":"9113_CR17","unstructured":"Jelinek, F. (1990). Self-organized language modeling for speech recognition. In A. Waibel & K. F. Lee (Eds.), Readings in speech recognition (pp. 450\u2013506). San Mateo, CA: Morgan Kaufmann."},{"key":"9113_CR18","volume-title":"Statistical methods for speech recognition","author":"F. Jelinek","year":"1997","unstructured":"Jelinek, F. (1997). Statistical methods for speech recognition. Cambridge, MA: The MIT Press."},{"key":"9113_CR19","unstructured":"Ko\u0142cz, A., & Chowdhury, A. (2007). Hardening fingerprinting by context. In CEAS\u201907. CA, USA: Mountain View."},{"key":"9113_CR20","unstructured":"Lavergne, T. (2008). Taxonomie de textes peu-naturels. In Actes des Journ\u00e9es Internationales d\u2019Ananlyse des Donn\u00e9es Textuelles (JADT\u2019O8), 2, 679\u2013689."},{"key":"9113_CR21","unstructured":"Lowd, D., & Meek, C. (2005). Adversarial learning. In Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD \u201905) (pp. 641\u2013647). New York, NY, USA: ACM."},{"key":"9113_CR22","volume-title":"Foundations of statistical natural language processing","author":"C. D. Manning","year":"1999","unstructured":"Manning, C. D., & Sch\u00fctze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: The MIT Press."},{"key":"9113_CR23","unstructured":"McEnery, T., & Oakes, M. (2000). Authorship identification and computational stylometry. In Handbook of natural language processing. New York: Marcel Dekker Inc."},{"key":"9113_CR24","unstructured":"Ntoulas, A., Najork, M., Manasse, M., & Fetterly, D. (2006). Detecting spam web pages through content analysis. In WWW \u201906: Proceedings of the 15th international conference on world wide web (pp. 83\u201392). New York, NY, USA: ACM. doi: 10.1145\/1135777.113579 ."},{"key":"9113_CR25","volume-title":"C4.5 : Programs for machine learning","author":"R. Quinlan","year":"1993","unstructured":"Quinlan, R. (1993). C4.5: Programs for machine learning. San Francisco: Morgan Kaufmann."},{"key":"9113_CR26","doi-asserted-by":"crossref","unstructured":"Seymore, K., & Rosenfeld, R. (1996). Scalable backoff language models. In Proceedings of the international conference on spoken language processing (ICSLP) (Vol. 1, pp. 232\u2013235). Philadelphia, PA.","DOI":"10.1109\/ICSLP.1996.607084"},{"key":"9113_CR27","unstructured":"Sichel, H. (1975). On a distribution law for word frequencies. In Journal of the American Statistical Association, 70, 542\u2013547."},{"key":"9113_CR28","unstructured":"Siivola, V., & Pellom, B. (2005). Growing an n-gram model. In Proceedings of the 9th international conference on speech technologies INTERSPEECH (pp. 1309\u20131312). Lisbon, Portugal."},{"key":"9113_CR29","doi-asserted-by":"crossref","unstructured":"Simpson, E. H. (1949). Measurement of diversity. Nature, 163,168.","DOI":"10.1038\/163688a0"},{"key":"9113_CR30","doi-asserted-by":"crossref","unstructured":"Stein, B., zu Eissen, S. M., & Potthast, M. (2007). Strategies for retrieving plagiarized documents. In ACM SIGIR (pp. 825\u2013826). New York, NY, USA.","DOI":"10.1145\/1277741.1277928"},{"key":"9113_CR31","unstructured":"Stolcke, A. (1998). Entropy-based pruning of backoff language models. In Proceedings of the DARPA broadcast news transcription and understanding workshop (pp. 270\u2013274). Lansdowne, VA."},{"key":"9113_CR32","doi-asserted-by":"crossref","unstructured":"Stolcke, A. (2002). SRILM\u2014an extensible language modeling toolkit. In Proceedings of the international conference on spoken language processing (ICSLP) (Vol. 2, pp. 901\u2013904). Denver, CO.","DOI":"10.21437\/ICSLP.2002-303"},{"issue":"1","key":"9113_CR33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1326561.1326564","volume":"2","author":"T. Urvoy","year":"2008","unstructured":"Urvoy, T., Chauveau, E., Filoche, P., & Lavergne, T. (2008). Tracking web spam with HTML style similarities. ACM Transactions on the Web, 2(1), 1\u201328.","journal-title":"ACM Transactions on the Web"},{"key":"9113_CR34","volume-title":"Data mining: Practical machine learning tools and techniques with java implementations","author":"I. H. Witten","year":"2005","unstructured":"Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques with java implementations. San Francisco: Morgan Kaufmann"},{"key":"9113_CR35","volume-title":"Human behavior and the principle of least effort: An introduction to human ecology","author":"G. K. Zipf","year":"1949","unstructured":"Zipf, G. K. (1949). Human behavior and the principle of least effort: An introduction to human ecology. Cambridge, MA: Addison-Wesley."}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-009-9113-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10579-009-9113-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-009-9113-0","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,29]],"date-time":"2023-05-29T15:52:58Z","timestamp":1685375578000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10579-009-9113-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,16]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,3]]}},"alternative-id":["9113"],"URL":"https:\/\/doi.org\/10.1007\/s10579-009-9113-0","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,1,16]]}}}