{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T14:12:31Z","timestamp":1761401551321,"version":"3.41.0"},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2011,6,1]],"date-time":"2011-06-01T00:00:00Z","timestamp":1306886400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2011,6]]},"abstract":"<jats:p>Stemming is a mechanism of word form normalization that transforms the variant word forms to their common root. In an Information Retrieval system, it is used to increase the system\u2019s performance, specifically the recall and desirably the precision. Although its usefulness is shown to be mixed in languages such as English, because morphologically complex languages stemming produces a significant performance improvement. A number of linguistic rule-based stemmers are available for most European languages which employ a set of rules to get back the root word from its variants. But for Indian languages which are highly inflectional in nature, devising a linguistic rule-based stemmer needs some additional resources which are not available. We present an approach which is purely corpus based and finds the equivalence classes of variant words in an unsupervised manner. A set of experiments on four languages using FIRE, CLEF, and TREC test collections shows that our approach provides comparable results with linguistic rule-based stemmers for some languages and gives significant performance improvement for resource constrained languages such as Bengali and Marathi.<\/jats:p>","DOI":"10.1145\/1967293.1967295","type":"journal-article","created":{"date-parts":[[2011,6,28]],"date-time":"2011-06-28T17:31:10Z","timestamp":1309282270000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["A Fast Corpus-Based Stemmer"],"prefix":"10.1145","volume":"10","author":[{"given":"Jiaul H.","family":"Paik","sequence":"first","affiliation":[{"name":"Indian Statistical Institute"}]},{"given":"Swapan K.","family":"Parui","sequence":"additional","affiliation":[{"name":"Indian Statistical Institute"}]}],"member":"320","published-online":{"date-parts":[[2011,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Almeida A. and Bhattacharyya P. 2008. Using morphology to improve Marathi monolingual information retrieval. In FIRE Working Note. Almeida A. and Bhattacharyya P. 2008. Using morphology to improve Marathi monolingual information retrieval. In FIRE Working Note ."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582416"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075218.1075244"},{"key":"e_1_2_1_4_1","unstructured":"Dolamic L. and Jacques S. 2008. Unine at fire 2008: Hindi Bengali and Marathi IR. In FIRE Working Note. Dolamic L. and Jacques S. 2008. Unine at fire 2008: Hindi Bengali and Marathi IR. In FIRE Working Note ."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120101750300490"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199601)47:1%3C70::AID-ASI7%3E3.3.CO;2-Q"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/243199.243209"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/160688.160718"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/974740.974746"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281485.1281489"},{"volume-title":"Proceedings of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF\u201907)","author":"Majumder P.","key":"e_1_2_1_12_1","unstructured":"Majumder , P. , Mitra , M. , and Pal , D . 2008. Bulgarian, Hungarian and Czech stemming using Yass . In Proceedings of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF\u201907) . 49--56. Majumder, P., Mitra, M., and Pal, D. 2008. Bulgarian, Hungarian and Czech stemming using Yass. In Proceedings of the 8th Workshop of the Cross-Language Evaluation Forum (CLEF\u201907). 49--56."},{"key":"e_1_2_1_13_1","unstructured":"Manning C. D. Raghavan P. and Schtze H. 2008. Introduction to Information Retrieval. Cambridge University Press Cambridge UK. Manning C. D. Raghavan P. and Schtze H. 2008. Introduction to Information Retrieval . Cambridge University Press Cambridge UK."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000009441.78971.be"},{"volume-title":"Proceedings of the Workshop on Cross-Language Evaluation Forum (CLEF\u201900)","author":"Oard D. W.","key":"e_1_2_1_15_1","unstructured":"Oard , D. W. , Levow , G.-A. , and Cabezas , C. I . 2000. Clef experiments at Maryland: Statistical stemming and backoff translation . In Proceedings of the Workshop on Cross-Language Evaluation Forum (CLEF\u201900) . 176--187. Oard, D. W., Levow, G.-A., and Cabezas, C. I. 2000. Clef experiments at Maryland: Statistical stemming and backoff translation. In Proceedings of the Workshop on Cross-Language Evaluation Forum (CLEF\u201900). 176--187."},{"volume-title":"Proceedings of the 6th Workshop of the Cross-Language Evaluation Forum (CLEF\u201905)","author":"Peters C.","key":"e_1_2_1_16_1","unstructured":"Peters , C. , Gey , F. , Gonzalo , J. , Mueller , H. , Jones , G. , Kluck , M. , Magnini , B. , and Rijke , M. D . 2006. Accessing multilingual information repositories . In Proceedings of the 6th Workshop of the Cross-Language Evaluation Forum (CLEF\u201905) . Peters, C., Gey, F., Gonzalo, J., Mueller, H., Jones, G., Kluck, M., Magnini, B., and Rijke, M. D. 2006. Accessing multilingual information repositories. In Proceedings of the 6th Workshop of the Cross-Language Evaluation Forum (CLEF\u201905)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199206)43:5<384::AID-ASI6>3.0.CO;2-L"},{"volume-title":"Proceedings of the 10th Conference of the European Chapter of the Computational Linguistics for South Asian Languages (EACL\u201903)","author":"Ramanathan A.","key":"e_1_2_1_18_1","unstructured":"Ramanathan , A. and Rao , D . 2003. A lightweight stemmer for Hindi . In Proceedings of the 10th Conference of the European Chapter of the Computational Linguistics for South Asian Languages (EACL\u201903) . Ramanathan, A. and Rao, D. 2003. A lightweight stemmer for Hindi. In Proceedings of the 10th Conference of the European Chapter of the Computational Linguistics for South Asian Languages (EACL\u201903)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2007.01.022"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/267954.267957"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1967293.1967295","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1967293.1967295","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:52:21Z","timestamp":1750243941000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1967293.1967295"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,6]]},"references-count":20,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,6]]}},"alternative-id":["10.1145\/1967293.1967295"],"URL":"https:\/\/doi.org\/10.1145\/1967293.1967295","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2011,6]]},"assertion":[{"value":"2010-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}