{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T18:42:32Z","timestamp":1776278552493,"version":"3.50.1"},"reference-count":44,"publisher":"Emerald","issue":"4","license":[{"start":{"date-parts":[[2005,8,1]],"date-time":"2005-08-01T00:00:00Z","timestamp":1122854400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,8,1]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-heading\">Purpose<\/jats:title><jats:p>To show that stem generation compares well with lemmatization as a morphological tool for a highly inflectional language for IR purposes in a best\u2010match retrieval system.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Design\/methodology\/approach<\/jats:title><jats:p>Effects of three different morphological methods \u2013 lemmatization, stemming and stem production \u2013 for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four\u2010point relevance scale which is partitioned differently in different test settings.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Findings<\/jats:title><jats:p>Results show that stem production, a lighter method than morphological lemmatization, compares well with lemmatization in a best\u2010match IR environment. Differences in performance between stem production and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used \u2013 a Porter stemmer implementation \u2013 is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of compound splitting and derivational expansion of queries are tested.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Practical implications<\/jats:title><jats:p>Usefulness of morphological lemmatization and stem generation for IR purposes can be estimated with many factors. On the average P\u2010R level they seem to behave very close to each other in a probabilistic IR system. Thus, the choice of the used method with highly inflectional languages needs to be estimated along other dimensions too.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Originality\/value<\/jats:title><jats:p>Results are achieved using Finnish as an example of a highly inflectional language. The results are of interest for anyone who is interested in processing of morphological variation of a highly inflected language for IR purposes.<\/jats:p><\/jats:sec>","DOI":"10.1108\/00220410510607480","type":"journal-article","created":{"date-parts":[[2005,8,9]],"date-time":"2005-08-09T00:30:40Z","timestamp":1123547440000},"page":"476-496","source":"Crossref","is-referenced-by-count":20,"title":["To stem or lemmatize a highly inflectional language in a probabilistic IR environment?"],"prefix":"10.1108","volume":"61","author":[{"given":"Kimmo","family":"Kettunen","sequence":"first","affiliation":[]},{"given":"Tuomas","family":"Kunttu","sequence":"additional","affiliation":[]},{"given":"Kalervo","family":"J\u00e4rvelin","sequence":"additional","affiliation":[]}],"member":"140","reference":[{"key":"key2022021120302710900_b1","doi-asserted-by":"crossref","unstructured":"Abu\u2010Salem, H., Al\u2010Omari, M. and Evens, M. (1999), \u201cStemming methodologies over individual query words for an Arabic information retrieval system\u201d, Journal of the American Society for Information Science, Vol. 50 No. 6, pp. 524\u20109.","DOI":"10.1002\/(SICI)1097-4571(1999)50:6<524::AID-ASI7>3.0.CO;2-M"},{"key":"key2022021120302710900_b2","unstructured":"Airio, E., Keskustalo, H., Hedlund, T. and Pirkola, A. (2003), \u201cMultilingual experiments of UTA at CLEF 2003. The impact of different merging strategies and word normalizing tools\u201d, in Peters, C. and Borri, F. (Eds), Results of the CLEF 2003 Evaluation Campaign, Cross\u2010Language Evaluation Forum, pp. 13\u201018."},{"key":"key2022021120302710900_b3","doi-asserted-by":"crossref","unstructured":"Alemayehu, N. and Willet, P. (2003), \u201cThe effectiveness of stemming for information retrieval in Amharic\u201d, Program, Vol. 37 No. 4, pp. 254\u20109.","DOI":"10.1108\/00330330310500748"},{"key":"key2022021120302710900_b4","unstructured":"Alkula, R. (2000), \u201cMerkkijonoista suomen kielen sanoiksi\u201d, Acta Universitatis Tamperensis 763, available at: http:\/\/acta.uta.fi\/pdf\/951\u201044\u20104886\u20103.pdf (accessed 31 March 2004)."},{"key":"key2022021120302710900_b5","doi-asserted-by":"crossref","unstructured":"Alkula, R. (2001), \u201cFrom plain character strings to meaningful words: producing better full text databases for inflectional and compounding languages with morphological analysis software\u201d, Information Retrieval, Vol. 4, pp. 195\u2010208.","DOI":"10.1023\/A:1011942104443"},{"key":"key2022021120302710900_b6","unstructured":"Baeza\u2010Yates, R. and Ribeiro\u2010Neto, B. (1999), Modern Information Retrieval, ACM Press, New York, NY."},{"key":"key2022021120302710900_b7","unstructured":"Bl\u00e5berg, O. (1994), \u201cThe ment model \u2013 complex states in Finite State Morphology\u201d, RUUL 27, Reports from Uppsala University, Department of Linguistics, Uppsala."},{"key":"key2022021120302710900_b8","doi-asserted-by":"crossref","unstructured":"Braschler, M. and Ripplinger, B. (2003), \u201cStemming and decompounding for German text retrieval\u201d, Advances in Information Retrieval, paper presented at 25th European Conference on IR Research, ECIR 2003, Pisa, Springer, New York, NY, pp. 177\u201092.","DOI":"10.1007\/3-540-36618-0_13"},{"key":"key2022021120302710900_b9","unstructured":"Broglio, J., Callan, J., Croft, B. and Nachbar, D. (1995), \u201cDocument retrieval and routing using the INQUERY system\u201d, Proceedings of the Third Text Retrieval Conference (TREC\u20103), National Institute of Standards and Technology, special publication 500\u2010225, Gaithesburg, MD, pp. 29\u201038."},{"key":"key2022021120302710900_b10","doi-asserted-by":"crossref","unstructured":"Callan, J., Croft, B. and Harding, S. (1992), \u201cThe INQUERY retrieval system\u201d, Proceedings of the Third International Conference on Databases and Expert Systems Applications, Springer, New York, NY, pp. 78\u201084.","DOI":"10.1007\/978-3-7091-7557-6_14"},{"key":"key2022021120302710900_b11","unstructured":"Conover, W.J. (1980), Practical Nonparametric Statistics, 2nd ed., Wiley, New York, NY."},{"key":"key2022021120302710900_b12","unstructured":"Frakes, W. (1992), \u201cStemming algorithms\u201d, in Frakes, W. and Baeza\u2010Yates, R. (Eds), Information Retrieval. Data Structures and Algorithms, Prentice\u2010Hall, Englewood Cliffs, NJ, pp. 131\u201060."},{"key":"key2022021120302710900_b13","unstructured":"Friedl, J.E.F. (1997), Mastering Regular Expressions, O'Reilly & Associates, Sebastopol, CA."},{"key":"key2022021120302710900_b14","doi-asserted-by":"crossref","unstructured":"Harman, D. (1991), \u201cHow effective is suffixing?\u201d, Journal of the American Society for Information Science, Vol. 42 No. 1, pp. 7\u201015.","DOI":"10.1002\/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P"},{"key":"key2022021120302710900_b15","doi-asserted-by":"crossref","unstructured":"Hull, D. (1996), \u201cStemming algorithms: a case study for detailed evaluation\u201d, Journal of the American Society for Information Science, Vol. 47 No. 1, pp. 70\u201084.","DOI":"10.1002\/(SICI)1097-4571(199601)47:1<70::AID-ASI7>3.0.CO;2-#"},{"key":"key2022021120302710900_b16","doi-asserted-by":"crossref","unstructured":"Jacquemin, C. and Tzoukerman, E. (1999), \u201cNLP for term variant extraction: synergy between morphology, lexicon, and syntax\u201d, in Strzalkowski, T. (Ed.), Natural Language Information Retrieval, Kluwer, Dordrecht, pp. 25\u201074.","DOI":"10.1007\/978-94-017-2388-6_2"},{"key":"key2022021120302710900_b17","doi-asserted-by":"crossref","unstructured":"Jansen, B., Spink, A. and Sarasevic, T. (2000), \u201cReal life, real users, and real needs: a study and analysis of user queries on the web\u201d, Information Processing and Management, Vol. 36, pp. 207\u201027.","DOI":"10.1016\/S0306-4573(99)00056-4"},{"key":"key2022021120302710900_b18","unstructured":"J\u00e4ppinen, H. and Ylilammi, M. (1986), \u201cAssociative model of morphological analysis: an empirical inquiry\u201d, Computational Linguistics, Vol. 12 No. 4, pp. 257\u201072."},{"key":"key2022021120302710900_b19","unstructured":"Karlsson, F. (1983), Suomen kielen \u00e4\u00e4nne\u2010 ja muotorakenne, Wsoy, Helsinki."},{"key":"key2022021120302710900_b20","doi-asserted-by":"crossref","unstructured":"Karlsson, F. (1986), \u201cFrequency considerations in morphology\u201d, Zeitsschrift f\u00fcr Phonetik, Sprachwissenschaft und Kommunikationsforschung, Vol. 39 No. 1, pp. 19\u201028.","DOI":"10.1524\/stuf.1986.39.14.19"},{"key":"key2022021120302710900_b21","unstructured":"Kek\u00e4l\u00e4inen, J. (1999), The Effects of Query Complexity, Expansion and Structure on Retrieval Performance in Probabilistic Retrieval, Acta Universitatis Tamperensis 678, Calgary."},{"key":"key2022021120302710900_b22","unstructured":"Kettunen, K. (1991a), \u201cDoing the stem generation with stemma\u201d, in Niemi, J. (Ed.), Papers from the Eighteenth Finnish Conference of Linguistics, Kielitieteellisi\u00e4 tutkimuksia, Joensuun yliopisto 24 pp. 80\u201097."},{"key":"key2022021120302710900_b23","unstructured":"Kettunen, K. (1991b), \u201cStemma, a robust noun stem generator for Finnish\u201d, Humanistiske Data, No. 1, pp. 26\u201031."},{"key":"key2022021120302710900_b24","doi-asserted-by":"crossref","unstructured":"Koskenniemi, K. (1983), Two\u2010Level Morphology: A General Computational Model for Word\u2010form Recognition and Production, 11, Publications of the Department of General linguistics, University of Helsinki, Helsinki.","DOI":"10.3115\/980431.980529"},{"key":"key2022021120302710900_b25","unstructured":"Koskenniemi, K. (1985), \u201cFINSTEMS: a module for information retrieval\u201d, in Karlsson, F. (Ed.), Computational Morphosyntax, Report on research 1981\u20101984. No. 13, Publications of the Department of General linguistics, University of Helsinki, Helsinki, pp. 81\u201092."},{"key":"key2022021120302710900_b26","doi-asserted-by":"crossref","unstructured":"Kraaij, W. and Pohlmann, R. (1996), \u201cViewing stemming as recall enhancement\u201d, in Frei, H\u2010P., Harman, D., Sch\u00e4uble, P. and Wilkinson, R. (Eds), Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (SIGIR '96), ACM Press, New York, NY, pp. 40\u20108.","DOI":"10.1145\/243199.243209"},{"key":"key2022021120302710900_b27","doi-asserted-by":"crossref","unstructured":"Krovetz, R. (2000), \u201cViewing morphology as an inference process\u201d, Artificial Intelligence, Vol. 118, pp. 277\u201094.","DOI":"10.1016\/S0004-3702(99)00101-0"},{"key":"key2022021120302710900_b28","unstructured":"Kunttu, T. (2003), \u201cPerus\u2010 ja taivutusmuotohakemiston tuloksellisuus todenn\u00e4k\u00f6isyyksiin perustuvassa tiedonhakuj\u00e4rjestelm\u00e4ss\u00e4\u201d, MSc thesis, Informaatiotutkimuksen pro gradu \u2010tutkielma, Informaatiotutkimuksen laitos, Tampereen yliopisto, Department of Information Studies, University of Tampere."},{"key":"key2022021120302710900_b29","unstructured":"Lovins, J. (1968), \u201cDevelopment of a stemming algorithm\u201d, Mechanical Translation and Computational Linguistics, Vol. 11, pp. 22\u201031."},{"key":"key2022021120302710900_b30","unstructured":"Matthews, P.H. (1991), Morphology, 2nd ed., Cambridge University Press, Cambridge, MA."},{"key":"key2022021120302710900_b31","doi-asserted-by":"crossref","unstructured":"Mayfield, J. and McNamee, P. (2003), \u201cSingle n\u2010gram stemming\u201d, Proceedings of Sigir2003, The Twenty\u2010Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415\u20106.","DOI":"10.1145\/860435.860528"},{"key":"key2022021120302710900_b32","doi-asserted-by":"crossref","unstructured":"Pirkola, A. (2001), \u201cMorphological typology of languages for IR\u201d, Journal of Documentation, Vol. 57 No. 3, pp. 330\u201048.","DOI":"10.1108\/EUM0000000007085"},{"key":"key2022021120302710900_b33","doi-asserted-by":"crossref","unstructured":"Popovi\u010d, M. and Willet, P. (1992), \u201cThe effectiveness of stemming for natural\u2010language access to slovene textual data\u201d, Journal of the American Society for Information Science, Vol. 43 No. 5, pp. 384\u201090.","DOI":"10.1002\/(SICI)1097-4571(199206)43:5<384::AID-ASI6>3.0.CO;2-L"},{"key":"key2022021120302710900_b34","doi-asserted-by":"crossref","unstructured":"Porter, M. (1980), \u201cAn algorithm for suffix stripping\u201d, Program, Vol. 14, pp. 130\u20107.","DOI":"10.1108\/eb046814"},{"key":"key2022021120302710900_b35","unstructured":"Porter, M. (2001), \u201cSnowball: a language for stemming algorithms\u201d, available at: http:\/\/snowball.tartarus.org\/texts\/introduction.html (accessed 28 November, 2003)."},{"key":"key2022021120302710900_b36","doi-asserted-by":"crossref","unstructured":"Schinke, R., Greengrass, M., Robertson, A. and Willet, P. (1996), \u201cA stemming algorithm for Latin text databases\u201d, Journal of Documentation, Vol. 52 No. 2, pp. 172\u201087.","DOI":"10.1108\/eb026966"},{"key":"key2022021120302710900_b37","doi-asserted-by":"crossref","unstructured":"Sever, H. and Bitirim, Y. (2003), \u201cFindStem: analysis and evaluation of a Turkish stemming algorithm\u201d, in Nascimento, M., Moura, E. and Oliveira, A. (Eds), String Processing and Information Retrieval, paper presented at 10th International Symposium, SPIRE 2003, pp. 238\u201051.","DOI":"10.1007\/978-3-540-39984-1_18"},{"key":"key2022021120302710900_b38","unstructured":"Siegel, S. and Castellan, J. Jr (1988), Nonparametric Statistics for the Behavioural Sciences, McGraw\u2010Hill, New York, NY."},{"key":"key2022021120302710900_b39","doi-asserted-by":"crossref","unstructured":"Silva, G. and Oliveira, C. (2003), \u201cThe implementation and evaluation of a lexicon\u2010based stemmer\u201d, in Nascimento, M., Moura, E. and Oliveira, A. (Eds), String Processing and Information Retrieval, paper presented at 10th International Symposium, SPIRE 2003, pp. 266\u201076.","DOI":"10.1007\/978-3-540-39984-1_20"},{"key":"key2022021120302710900_b40","unstructured":"Sormunen, E. (2000), A Method for Measuring Wide Range Performance of Boolean Queries in Full\u2010Text Databases, Acta Universitatis Tamperensis 748, Calgary."},{"key":"key2022021120302710900_b41","doi-asserted-by":"crossref","unstructured":"Sparck Jones, K. (1974), \u201cAutomatic indexing\u201d, Journal of Documentation, Vol. 30 No. 4, pp. 393\u2010432.","DOI":"10.1108\/eb026588"},{"key":"key2022021120302710900_b45","unstructured":"The Finnish stemming algorithm (n.d.), available at: http:\/\/snowball.tartarus.org\/finnish\/stemmer.html (accessed 28 November 2003)."},{"key":"key2022021120302710900_b43","doi-asserted-by":"crossref","unstructured":"Tomlinson, S. (2002), \u201cExperiments in 8 European languages with Hummingbird SearchServer\u2122 at CLEF 2002\u201d, available at: http:\/\/clef.iei.pi.cnr.it:2002\/workshop2002\/WN\/26.pdf (accessed 28 April 2004).","DOI":"10.1007\/978-3-540-45237-9_20"},{"key":"key2022021120302710900_b44","doi-asserted-by":"crossref","unstructured":"Tomlinson, S. (2003), \u201cLexical and algorithmic stemming compared for 9 European languages with Hummingbird SearchServer\u2122 at CLEF 2003\u201d, available at: http:\/\/clef.iei.pi.cnr.it\/2003\/WN_web\/19.pdf (accessed 28 April 2004).","DOI":"10.1007\/978-3-540-30222-3_27"}],"container-title":["Journal of Documentation"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.emeraldinsight.com\/doi\/full-xml\/10.1108\/00220410510607480","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/00220410510607480\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/00220410510607480\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:37:32Z","timestamp":1753400252000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/jd\/article\/61\/4\/476-496\/200065"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,8,1]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2005,8,1]]}},"alternative-id":["10.1108\/00220410510607480"],"URL":"https:\/\/doi.org\/10.1108\/00220410510607480","relation":{},"ISSN":["0022-0418"],"issn-type":[{"value":"0022-0418","type":"print"}],"subject":[],"published":{"date-parts":[[2005,8,1]]}}}