{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,13]],"date-time":"2025-07-13T00:04:03Z","timestamp":1752365043042,"version":"3.41.2"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2002,5,1]],"date-time":"2002-05-01T00:00:00Z","timestamp":1020211200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2002,5,1]],"date-time":"2002-05-01T00:00:00Z","timestamp":1020211200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers and the Humanities"],"published-print":{"date-parts":[[2002,5]]},"DOI":"10.1023\/a:1014344527505","type":"journal-article","created":{"date-parts":[[2002,12,28]],"date-time":"2002-12-28T15:15:16Z","timestamp":1041088516000},"page":"171-190","source":"Crossref","is-referenced-by-count":1,"title":["On the Corpus Size Needed for Compiling a Comprehensive Computational Lexicon by Automatic Lexical Acquisition"],"prefix":"10.1007","volume":"36","author":[{"given":"Dan-Hee","family":"Yang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ik-Hwan","family":"Lee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pascual","family":"Cantos","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","reference":[{"key":"357164_CR1","unstructured":"Church, K.W. and R.L. Mercer. \u201cIntroduction to the Special Issue on Computational Linguistics Using Large Corpora\u201d. In Using Large Corpora. Ed. Susan Armstrong. The MIT Press, 1994, pp. 1\u201324."},{"key":"357164_CR2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1515\/9783110878202.3","volume-title":"New Directions in English Language Corpora, Methodology Results, Software Development","author":"P. De Haan","year":"1992","unstructured":"De Haan, P. \u201cThe Optimum Corpus Sample Size?\u201d In New Directions in English Language Corpora, Methodology Results, Software Development. Ed. Gerhard Leitner, Gerhard. New York: Mouton de Gruyte, 1992, pp. 3\u201319."},{"key":"357164_CR3","first-page":"42","volume-title":"Statistics","author":"W. Hays","year":"1994","unstructured":"Hays, W. Statistics. Florida: Harcourt Brace College Publishers, 1994, pp. 42\u201347, 94\u2013111."},{"key":"357164_CR4","first-page":"206","volume-title":"Information Retrieval: Computational and Theoretical Aspects","author":"H.S. Heaps","year":"1978","unstructured":"Heaps, H.S. Information Retrieval: Computational and Theoretical Aspects. New York: Academic Press, 1978, pp. 206\u2013208."},{"key":"357164_CR5","first-page":"7","volume-title":"Lexicographic Study","author":"C.-S. Jeong","year":"1990","unstructured":"Jeong, C.-S., S.-S. Lee, K.-S. Nam et al. \u201cSelection Criteria of Sampling for Frequency Survey in Korean Words\u201d. In Lexicographic Study, Vol. 3. Seoul: Tap Press, 1990, pp. 7\u201369."},{"key":"357164_CR6","first-page":"134","volume-title":"Lexicographic Study","author":"Y.-M. Jeong","year":"1995","unstructured":"Jeong, Y.-M. \u201cStatistical Characteristics of Korean Vocabulary and Its Application\u201d. In Lexicographic Study, Vol. 5, 6. Seoul: Tap Press, 1995, pp. 134\u2013163."},{"issue":"1","key":"357164_CR7","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1017\/S1351324996001246","volume":"2","author":"S. Katz","year":"1996","unstructured":"Katz, S. \u201cDistribution of Content Words and Phrases in Text and Language Modelling\u201d. Journal of Natural Language Engineering 2(1) (1996), pp. 15\u201359.","journal-title":"Journal of Natural Language Engineering"},{"key":"357164_CR8","unstructured":"Kwen, H.-C. \u201cPerformance Improvement of Korean Information Processing System by Using a Corpus. Corpus and the Korean Language Information\u201d. In The 9th Annual Meeting of Korean Lexicographic Center. Korean Lexicographic Center at Yonsei University, Seoul, 1997."},{"key":"357164_CR9","unstructured":"Lauer, M. \u201cConserving Fuel in Statistical Language Learning: Predicting Data Requirements\u201d. In The 8th Australian Joint Conference on Artificial Intelligence. Canberra, 1995a."},{"key":"357164_CR10","unstructured":"Lauer, M. \u201cHow Much is Enough?: Data Requirements for Statistical NLP\u201d. In 2th Conference of the Pacific Association for Computational Linguistics. Brisbane, Australia."},{"key":"357164_CR11","first-page":"47","volume-title":"An Introduction to French Linguistics","author":"H.-Y. Lee","year":"1974","unstructured":"Lee, H.-Y. An Introduction to French Linguistics. Seoul: Jeong-Eum-Sa, 1974, p. 47."},{"key":"357164_CR12","unstructured":"Resnik, P. Selection and Information: A Class-Based Approach to Lexical Relationships. Ph.D. Dissertation of Department of Computer and Information Science. Pennsylvania University, 1993, pp. 6\u201333."},{"issue":"2","key":"357164_CR13","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1075\/ijcl.2.2.06san","volume":"2","author":"A. S\u00e1nchez","year":"1997","unstructured":"S\u00e1nchez, A. and P. Cantos. \u201cPredictability of Word Forms (Types) and Lemmas in Linguistic Corpora. A Case Study Based on the Analysis of the CUMBRE Corpus: An 8-Million-Word Corpus of Contemporary Spanish\u201d, International Journal of Corpus Linguistics, 2(2) (1997), pp. 259\u2013280.","journal-title":"International Journal of Corpus Linguistics"},{"issue":"1","key":"357164_CR14","first-page":"205","volume":"XIX","author":"A. S\u00e1nchez","year":"1998","unstructured":"S\u00e1nchez, A. and P. Cantos. \u201cEl ritmo incremental de palabras nuevas en los repertorios de textos. Estudio experimental y comparativo basado en dos corpus ling\u00fc\u00edsticos equivalentes de cuatro millones de palabras, de las lenguas inglesa y espa\u00f1ola y en cinco autores de ambas lenguas\u201d. In Atlantis (Revista de la Asociaci\u00f3n Espa\u00f1ola de Estudios Anglo-Norteamericanos), Vol. XIX(1). Spain, 1998, pp. 205\u2013223.","journal-title":"Atlantis (Revista de la Asociaci\u00f3n Espa\u00f1ola de Estudios Anglo-Norteamericanos)"},{"key":"357164_CR15","unstructured":"Stewart, I. and D. Tall. The Foundations of Mathematics. Oxford University Press, 1977, pp. 41\u201361."},{"key":"357164_CR16","volume-title":"Ulimal Keun Dictionary","author":"The Society of Hangul.","year":"1992","unstructured":"The Society of Hangul. \u2018Ulimal Keun Dictionary. Seoul: Eomun-gak Press, 1992."},{"key":"357164_CR17","unstructured":"Weischedel, R. et al. \u201cCoping with Ambiguity and Unknown Words through Probabilistic Models\u201d. In Using Large Corpora. Ed. Susan Armstrong. The MIT Press, 1994, pp. 323\u2013326."},{"key":"357164_CR18","unstructured":"Yang, D.-H. and M. Song. \u201cHow Much Training Data Is Required to Remove Data Sparseness in Statistical language Learning?\u201d In Proceedings of the First Workshop on Text, Speech, Dialogue (TSD'98). Bruno, 1998, pp. 141\u2013146."},{"issue":"4","key":"357164_CR19","first-page":"568","volume":"26","author":"D.-H. Yang","year":"1999","unstructured":"Yang, D.-H., S.-J. Lim and M. Song. \u201cThe Estimate of the Corpus Size for Solving Data Sparseness\u201d. Journal of KISS, 26(4) (1999a), pp. 568\u2013583.","journal-title":"Journal of KISS"},{"issue":"2","key":"357164_CR20","first-page":"161","volume":"12","author":"D.-H. Yang","year":"1999","unstructured":"Yang, D.-H. and M. Song. \u201cRepresentation and Acquisition of the Word Meaning for Picking out Thematic Roles\u201d. International Journal of Computer Processing of Oriental Languages (CPOL), 12(2) (1999b), pp. 161\u2013177 (the Oriental Languages Computer Society).","journal-title":"International Journal of Computer Processing of Oriental Languages (CPOL)"},{"key":"357164_CR21","doi-asserted-by":"crossref","unstructured":"Yang, D.-H., P.C. G\u00f3mez and M. Song. \u201cAn Algorithm for Predicting the Relationship between Lemmas and Corpus Size\u201d, ETRI Journal, 22(2) (2000) (Electronics and Telecommunications Research Institute).","DOI":"10.4218\/etrij.00.0100.0203"},{"key":"357164_CR22","unstructured":"Zernik, U. Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Lawrence Erlbaum Associates, 1991, pp. 1\u201326."}],"container-title":["Computers and the Humanities"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1023\/A:1014344527505.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1023\/A:1014344527505\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1023\/A:1014344527505.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,12]],"date-time":"2025-07-12T21:43:01Z","timestamp":1752356581000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1023\/A:1014344527505"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2002,5]]},"references-count":22,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2002,5]]}},"alternative-id":["357164"],"URL":"https:\/\/doi.org\/10.1023\/a:1014344527505","relation":{},"ISSN":["0010-4817","1572-8412"],"issn-type":[{"type":"print","value":"0010-4817"},{"type":"electronic","value":"1572-8412"}],"subject":[],"published":{"date-parts":[[2002,5]]}}}