{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T19:54:45Z","timestamp":1759694085090,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2014,10,3]],"date-time":"2014-10-03T00:00:00Z","timestamp":1412294400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2014,10,3]]},"abstract":"<jats:p>Stemming is a basic method for morphological normalization of natural language texts. In this study, we focus on the problem of stemming several resource-poor languages from Eastern India, viz., Assamese, Bengali, Bishnupriya Manipuri and Bodo. While Assamese, Bengali and Bishnupriya Manipuri are Indo-Aryan, Bodo is a Tibeto-Burman language. We design a rule-based approach to remove suffixes from words. To reduce over-stemming and under-stemming errors, we introduce a dictionary of frequent words. We observe that, for these languages a dominant amount of suffixes are single letters creating problems during suffix stripping. As a result, we introduce an HMM-based hybrid approach to classify the mis-matched last character. For each word, the stem is extracted by calculating the most probable path in four HMM states. At each step we measure the stemming accuracy for each language. We obtain 94% accuracy for Assamese and Bengali and 87%, and 82% for Bishnupriya Manipuri and Bodo, respectively, using the hybrid approach. We compare our work with Morfessor [Creutz and Lagus 2005]. As of now, there is no reported work on stemming for Bishnupriya Manipuri and Bodo. Our results on Assamese and Bengali show significant improvement over prior published work [Sarkar and Bandyopadhyay 2008; Sharma et al. 2002, 2003].<\/jats:p>","DOI":"10.1145\/2629670","type":"journal-article","created":{"date-parts":[[2014,10,7]],"date-time":"2014-10-07T12:57:47Z","timestamp":1412686667000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Stemming resource-poor Indian languages"],"prefix":"10.1145","volume":"13","author":[{"given":"Navanath","family":"Saharia","sequence":"first","affiliation":[{"name":"Tezpur University"}]},{"given":"Utpal","family":"Sharma","sequence":"additional","affiliation":[{"name":"Tezpur University"}]},{"given":"Jugal","family":"Kalita","sequence":"additional","affiliation":[{"name":"University of Colorado, Colorado Springs"}]}],"member":"320","published-online":{"date-parts":[[2014,10,3]]},"reference":[{"volume-title":"Reduplicative Structures: A phenomenon of the South Asian linguistic area","year":"1985","author":"Abbi A.","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1460027.1460030"},{"volume-title":"Proceedings of the 7th conference on International Language Resources and Evaluation. 811--815","author":"Aswani N.","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","unstructured":"L. S. Bora. 2006. Asamiya Bhasar Ruptattva. M\/s Banalata Guwahati Assam India.  L. S. Bora. 2006. Asamiya Bhasar Ruptattva . M\/s Banalata Guwahati Assam India."},{"volume-title":"Advances in Information Retrieval: 25th European Conference on IR Research. Fabrizio Sebastiani, Ed., Springer, 177--192","author":"Braschler M.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118647.1118650"},{"volume-title":"Tech. Rep. Helsinki University of Technology.","year":"2005","author":"Creutz M.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","unstructured":"A. Das and S. Bandyopadhyay. 2010. Morphological stemming cluster identification for Bangla. In Knowledge Sharing Event-1 Task-3: Morphological Analysers and Generators Vol. 3 Mysore India.  A. Das and S. Bandyopadhyay. 2010. Morphological stemming cluster identification for Bangla. In Knowledge Sharing Event-1 Task-3: Morphological Analysers and Generators Vol. 3 Mysore India."},{"volume-title":"Proceedings of the 5th International Conference on Natural Language Processing. 60--66","author":"Dasgupta S.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"B. T. Din\u00e7er and B. Karaoglan. 2003. Stemming in agglutinative languages: A probabilistic stemmer for Turkish. In Computer and Information Sciences Adnan Yazici and Cevat Sener Eds. Springer 244--251.  B. T. Din\u00e7er and B. Karaoglan. 2003. Stemming in agglutinative languages: A probabilistic stemmer for Turkish. In Computer and Information Sciences Adnan Yazici and Cevat Sener Eds. Springer 244--251.","DOI":"10.1007\/978-3-540-39737-3_31"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2009.06.001"},{"volume-title":"Proceedings of the Workshop on NER for South and South East Asian Languages in Collaboration with 3rd International Joint Conference on Natural Language Processing. 51--58","author":"Ekbal A.","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","unstructured":"U. Goswami. 2001. Asamiya Bhashar Vyakaran. Mani Manik Prakash Guwahati Assam India.  U. Goswami. 2001. Asamiya Bhashar Vyakaran . Mani Manik Prakash Guwahati Assam India."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199601)47:1%3C70::AID-ASI7%3E3.3.CO;2-Q"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1031171.1031285"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/243199.243209"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00101-0"},{"volume-title":"Proceedings of the Conference on Empirical Methods on Natural Language Processing. 230--237","author":"Kudo T.","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5120\/1634-2196"},{"volume-title":"Proceedings of the International Conference on Machine Learning. 282--289","author":"Lafferty J.","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564425"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1177\/016555158100300403"},{"key":"e_1_2_1_24_1","first-page":"22","article-title":"Development of a stemming algorithm","volume":"11","author":"Lovins J. B.","year":"1968","journal-title":"Mechanical Trans. Computat. Linguis."},{"key":"e_1_2_1_25_1","first-page":"2716","article-title":"Discovering suffixes: A case study for Marathi language","volume":"04","author":"Majgaonker M. M.","year":"2010","journal-title":"Int. J. Comput. Sci. Eng."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281485.1281489"},{"key":"e_1_2_1_27_1","unstructured":"T. McFadden. 2004. The position of morphological case in the derivation: a study on the syntax-morphology interface. Ph.D. Dissertation University of Pennsylvania.  T. McFadden. 2004. The position of morphological case in the derivation: a study on the syntax-morphology interface. Ph.D. Dissertation University of Pennsylvania."},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"P. McNamee J. Mayfield and C. D. Piatko. 2001. A language-independent approach to European text retrieval. In Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation. 129--139.   P. McNamee J. Mayfield and C. D. Piatko. 2001. A language-independent approach to European text retrieval. In Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation . 129--139.","DOI":"10.1007\/3-540-44645-1_12"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956889"},{"key":"e_1_2_1_30_1","first-page":"1","article-title":"Free word order, morphological case, and sympathy theory","volume":"11","author":"M\u00fcller G.","year":"2002","journal-title":"Resolving Conflicts in Grammars: Optimality Theory in Syntax, Morphology, and Phonology"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"D. W. Oard G. Levow and C. I. Cabezas. 2001. CLEF \u201900 experiments at the University of Maryland: Statistical stemming and backoff translation strategies. In Cross-Language Information Retrieval and Evaluation Carol Peters Ed. Springer 176--187.   D. W. Oard G. Levow and C. I. Cabezas. 2001. CLEF \u201900 experiments at the University of Maryland: Statistical stemming and backoff translation strategies. In Cross-Language Information Retrieval and Evaluation Carol Peters Ed. Springer 176--187.","DOI":"10.1007\/3-540-44645-1_17"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2037661.2037664"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1967293.1967295"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390749.1390765"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"e_1_2_1_36_1","unstructured":"M. F. Porter. 2012. Stemming algorithms for various European languages. http:\/\/snowball.tartarus.org\/texts\/stemmersoverview.html.  M. F. Porter. 2012. Stemming algorithms for various European languages. http:\/\/snowball.tartarus.org\/texts\/stemmersoverview.html."},{"key":"e_1_2_1_37_1","unstructured":"A. Prince and P. Smolensky. 1993. Optimality theory: Constraint interaction in generative grammar. Tech. Rep. Rutgers University.  A. Prince and P. Smolensky. 1993. Optimality theory: Constraint interaction in generative grammar. Tech. Rep. Rutgers University."},{"key":"e_1_2_1_38_1","unstructured":"V. S. Ram and S. L. Devi. 2010. Malayalam Stemmer. In Morphological Analysers and Generators Mona Parakh Ed. 105--113.  V. S. Ram and S. L. Devi. 2010. Malayalam Stemmer. In Morphological Analysers and Generators Mona Parakh Ed. 105--113."},{"volume-title":"Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, on Computational Linguistics for South Asian Languages. 43--48","author":"Ramanathan A.","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075096.1075146"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-37247-6_14"},{"volume-title":"Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages. 65--72","author":"Sarkar S.","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H"},{"volume-title":"Proceedings of 2nd National Conference on Computational Intelligence and Signal Processing. 91--94","author":"Sharma P.","key":"e_1_2_1_45_1"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118647.1118648"},{"volume-title":"Proceedings of the 6th International Conference on Computational Intelligence and Natural Computing. 1593--1596","author":"Sharma U.","key":"e_1_2_1_47_1"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1386869.1386871"},{"key":"e_1_2_1_49_1","unstructured":"K. P. Sinha. 1982. The Bishnupriya Manipuri Language (1st Ed.). Firma KLM Private Limited Calcutta India.  K. P. Sinha. 1982. The Bishnupriya Manipuri Language (1st Ed.). Firma KLM Private Limited Calcutta India."},{"volume-title":"Proceedings of the International Conference on Recent Advances on Natural Language Processing. 411--415","author":"\u0160najder J.","key":"e_1_2_1_50_1"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ITCC.2005.90"},{"volume-title":"Proceedings of the Conference on Empirical Methods on Natural Language Processing","author":"Uchimoto K.","key":"e_1_2_1_52_1"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1054010"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622153.1622162"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075218.1075245"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629670","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2629670","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:30Z","timestamp":1750227210000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629670"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,10,3]]},"references-count":54,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2014,10,3]]}},"alternative-id":["10.1145\/2629670"],"URL":"https:\/\/doi.org\/10.1145\/2629670","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2014,10,3]]},"assertion":[{"value":"2013-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}