{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T17:49:58Z","timestamp":1780422598175,"version":"3.54.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2009,2,10]],"date-time":"2009-02-10T00:00:00Z","timestamp":1234224000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2009,9]]},"DOI":"10.1007\/s10579-009-9081-4","type":"journal-article","created":{"date-parts":[[2009,2,9]],"date-time":"2009-02-09T10:33:17Z","timestamp":1234175597000},"page":"209-226","source":"Crossref","is-referenced-by-count":441,"title":["The WaCky wide web: a collection of very large linguistically processed web-crawled corpora"],"prefix":"10.1007","volume":"43","author":[{"given":"Marco","family":"Baroni","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Silvia","family":"Bernardini","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Adriano","family":"Ferraresi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eros","family":"Zanchetta","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2009,2,10]]},"reference":[{"key":"9081_CR1","doi-asserted-by":"crossref","DOI":"10.1007\/978-94-010-0844-0","volume-title":"Word frequency distributions","author":"A. Baayen","year":"2001","unstructured":"Baayen, A. (2001). Word frequency distributions. Dordrecht: Kluwer."},{"key":"9081_CR2","unstructured":"Baroni, M., & Bernardini, S. (Eds.). (2006). Wacky! Working papers on the web as corpus. Bologna: Gedit."},{"key":"9081_CR3","doi-asserted-by":"crossref","unstructured":"Baroni, M., & Kilgarriff, A. (2006). Large linguistically-processed web corpora for multiple languages. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics, Trento, Italy, pp. 87\u201390.","DOI":"10.3115\/1608974.1608976"},{"key":"9081_CR4","unstructured":"Baroni, M., & Ueyama, M. (2006). Building general- and special-purpose corpora by web crawling. In Proceedings of the 13th NIJL international symposium, language corpora: Their compilation and application, Tokyo, Japan, pp. 31\u201340."},{"key":"9081_CR5","doi-asserted-by":"crossref","unstructured":"Boleda, G., Bott, S., Meza, R., Castillo, C., Badia, T., & L\u00f3pez, V. (2006). CUCWeb: A Catalan corpus built from the web. In Kilgarriff and Baroni (2006), pp. 19\u201326.","DOI":"10.3115\/1628297.1628301"},{"key":"9081_CR6","volume-title":"Web 1T 5-gram, version 1","author":"T. Brants","year":"2006","unstructured":"Brants, T., & Franz, A. (2006). Web 1T 5-gram, version 1. Philadelphia: Linguistic Data Consortium."},{"key":"9081_CR7","doi-asserted-by":"crossref","unstructured":"Broder, A., Glassman, S., Manasse, M., & Zweig, G. (1997). Syntactic clustering of the web. In Proceedings of the sixth international world wide web conference, Santa Clara, California, pp. 391\u2013404.","DOI":"10.1016\/S0169-7552(97)00031-7"},{"key":"9081_CR8","unstructured":"Ciaramita, M., & Baroni, M. (2006). Measuring web corpus randomness: A progress report. In Baroni and Bernardini (2006), pp. 127\u2013158."},{"key":"9081_CR9","doi-asserted-by":"crossref","unstructured":"Clarke, C., Cormack, G., Laszlo, M., Lynam, T., & Terra, E. (2002). The impact of corpus size on question answering performance. In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland, pp. 369\u2013370.","DOI":"10.1145\/564376.564448"},{"issue":"(1","key":"9081_CR10","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1145\/1067268.1067274","volume":"39","author":"C. Clarke","year":"2005","unstructured":"Clarke, C., Craswell, N., & Soboroff, I. (2005). The TREC terabyte retrieval track. SIGIR Forum, 39(1), 25.","journal-title":"SIGIR Forum"},{"issue":"1","key":"9081_CR11","first-page":"61","volume":"19","author":"T. Dunning","year":"1993","unstructured":"Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61\u201374.","journal-title":"Computational Linguistics"},{"key":"9081_CR12","unstructured":"Emerson, T., & O\u2019Neil, J. (2006). Experience building a large corpus for Chinese lexicon construction. In Baroni and Bernardini (2006), pp. 41\u201362."},{"key":"9081_CR13","unstructured":"Fairon, C., Naets, H., Kilgarriff, A., & de Schryver, G.-M. (Eds.). (2007). Building and exploring web corpora. In Proceedings of the 3rd web as corpus workshop, incorporating Cleaneval. Louvain: Presses Universitaires de Louvain."},{"key":"9081_CR14","unstructured":"Ferraresi, A. (2007). Building a very large corpus of English obtained by web crawling: ukWaC. MA Dissertation, University of Bologna. Retrieved January 28, 2008, from http:\/\/wacky.sslmit.unibo.it"},{"key":"9081_CR15","first-page":"191","volume-title":"Corpus linguistics in North America 2002","author":"W. Fletcher","year":"2004","unstructured":"Fletcher, W. (2004). Making the web more useful as a source for linguistic corpora. In U. Connor & T. Upton (Eds.), Corpus linguistics in North America 2002 (pp. 191\u2013205). Amsterdam: Rodopi."},{"key":"9081_CR16","volume-title":"Corpus linguistics and the web","year":"2007","unstructured":"Hundt, M., Nesselhauf, N., & Biewer, C. (Eds.). (2007). Corpus linguistics and the web. Amsterdam: Rodopi."},{"key":"9081_CR17","doi-asserted-by":"crossref","unstructured":"Kilgarriff, A., & Baroni, M. (Eds.). (2006). Proceedings of the 2nd international workshop on the web as corpus. East Stroudsburg, PA: ACL.","DOI":"10.3115\/1628297"},{"issue":"3","key":"9081_CR18","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1162\/089120103322711569","volume":"29","author":"A. Kilgarriff","year":"2003","unstructured":"Kilgarriff, A., & Grefenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3), 333\u2013347.","journal-title":"Computational Linguistics"},{"key":"9081_CR19","doi-asserted-by":"crossref","unstructured":"Kornai, A., Hal\u00e1csy, P., Nagy, V., Oravecz, C., Tr\u00f3n, V., & Varga, D. (2006). Web-based frequency dictionaries for medium density languages. In Kilgarriff and Baroni (2006), pp. 1\u20138.","DOI":"10.3115\/1628297.1628298"},{"issue":"3","key":"9081_CR20","first-page":"37","volume":"5","author":"D. Lee","year":"2001","unstructured":"Lee, D. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology, 5(3), 37\u201372.","journal-title":"Language Learning & Technology"},{"key":"9081_CR21","unstructured":"Liu, V., & Curran, J. (2006). Web text corpus for natural language processing. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics. Trento, Italy, pp. 233\u2013240."},{"key":"9081_CR22","unstructured":"Santini, M., & Sharoff, S. (Eds.). (2007). Proceedings of the CL 2007 colloquium: Towards a reference corpus of web genres, Birmingham, UK."},{"key":"9081_CR23","unstructured":"Shaoul, C., & Westbury, C. 2007. A USENET corpus (2005\u20132007). Retrieved January 28, 2008, from http:\/\/www.psych.ualberta.ca\/~westburylab\/downloads\/usenetcorpus.download.html"},{"key":"9081_CR24","unstructured":"Sharoff, S. (2006). Creating general-purpose corpora using automated search engine queries. In Baroni and Bernardini (2006), pp. 63\u201398."},{"issue":"1","key":"9081_CR25","first-page":"71","volume":"9","author":"J. McH. Sinclair","year":"1996","unstructured":"Sinclair, J. McH. (1996). The search for units of meaning. Textus 9(1), 71\u2013106.","journal-title":"Textus"},{"key":"9081_CR26","first-page":"1","volume-title":"Developing linguistic corpora: A guide to good practice.","author":"J. McH. Sinclair","year":"2005","unstructured":"Sinclair, J. McH. (2005). Corpus and text\u2014Basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 1\u201316). Oxford: Oxbow Books."},{"issue":"4","key":"9081_CR27","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1075\/ijcl.10.4.07the","volume":"10","author":"M. Thelwall","year":"2005","unstructured":"Thelwall, M. (2005). Creating and using web corpora. International Journal of Corpus Linguistics, 10(4), 517\u2013541.","journal-title":"International Journal of Corpus Linguistics"},{"key":"9081_CR28","unstructured":"Ueyama, M. (2006). Evaluation of Japanese web-based reference corpora: Effects of seed selection and time interval. In Baroni and Bernardini (2006), pp. 99\u2013126."}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-009-9081-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10579-009-9081-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-009-9081-4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,5,30]],"date-time":"2019-05-30T14:21:15Z","timestamp":1559226075000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10579-009-9081-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,2,10]]},"references-count":28,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,9]]}},"alternative-id":["9081"],"URL":"https:\/\/doi.org\/10.1007\/s10579-009-9081-4","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,2,10]]}}}