{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T21:15:54Z","timestamp":1764018954894,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T00:00:00Z","timestamp":1652745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2022,7,31]]},"abstract":"<jats:p>The morphological variations of highly inflected languages that appear in a text impede the progress of computer processing and root word determination tasks while extracting an abstract. As a remedy to this difficulty, a lemmatization algorithm is developed, and its effectiveness is evaluated for Word Sense Disambiguation (WSD). Having observed its usefulness, lemmatizer is considered for developing Natural Language Processing tools for languages rich in morphological variations. Among various Indian highly inflected languages, Assamese, spoken by over 14 million people in the North-Eastern region of India, is also one of them. In this present work, after a detailed study on the possible transformations through which surface words are created from lemmas, we have designed an Assamese lemmatizer in such a manner that suitable reverse transformations can be employed on a surface word to derive the co-relative (similar) lemma back. And it has been observed that the lemmatizer is competent to deal with inflectional and derivational morphology in Assamese, and the same was evaluated on various Assamese articles extracted from the Assamese Corpus consisting of 50,000 surface words (excluding proper nouns), and the result that it yielded with 82% accuracy was quite encouraging and satisfying, as Assamese is a low-level language and no research work has been done in the Assamese language regarding the lemmatization of words. Considering the result obtained, the lemmatizer is then evaluated for Assamese WSD. For this purpose, 10 highly polysemous Assamese words are taken into account for sense disambiguation. We have also regarded varied WSD systems and observed that such systems enhance the effectiveness of all the WSD systems, which is statistically significant.<\/jats:p>","DOI":"10.1145\/3502157","type":"journal-article","created":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T12:07:11Z","timestamp":1652789231000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["A Lemmatizer for Low-resource Languages: WSD and Its Role in the Assamese Language"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0574-9868","authenticated-orcid":false,"given":"Arjun","family":"Gogoi","sequence":"first","affiliation":[{"name":"Dibrugarh University, Dibrugarh, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4965-9933","authenticated-orcid":false,"given":"Nomi","family":"Baruah","sequence":"additional","affiliation":[{"name":"Dibrugarh University, Dibrugarh, India"}]}],"member":"320","published-online":{"date-parts":[[2022,5,17]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","unstructured":"N. Saharia K. Konwar and J. Kalita. 2013. An improved stemming approach using HMM for a highly inflectional language. In Computational Linguistics and Intelligent Text Processing 7816.","DOI":"10.1007\/978-3-642-37247-6_14"},{"key":"e_1_3_1_3_2","unstructured":"Chatterji. 1926. The Origin and Development of the Bengali Language ."},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","unstructured":"M. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries. In Proceedings of the 5th Annual International Conference on Systems . 24\u201326.","DOI":"10.1145\/318723.318728"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"A. Kilgarriff and J. Rosenzweig. 2000. Framework and results for english SENSEVAL. Comput. Human. 34 1 (2000) 15\u20134.","DOI":"10.1023\/A:1002693207386"},{"key":"e_1_3_1_6_2","unstructured":"S. Seal and N. Joshi. 2019. Design of an inflectional rule-based assamese stemmer. Int. J. Innov. Technol. Explor. Eng. 8 6 (2019) 1651\u20131655."},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"N. Saharia U. Sarmah and J. Kalita. 2012. Analysis and evaluation of stemming algorithms. In Proceedings of the International Conference on Advances in Computing Communications and Informatics . 842\u2013846.","DOI":"10.1145\/2345396.2345533"},{"key":"e_1_3_1_8_2","unstructured":"M. Rahman and S. K. Sarma. 2016. Analysing Morphology of Assamese Words using Finite State Transducer. Int. J. Innov. Res. Comput. Commun. Eng. 4 12 (2016) 21801\u201321807."},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","unstructured":"P. Sharma U. Sarmah and J. Kalita. 2012. Suffix stripping based NER in assamese for location names. In Proceedings of the 2nd National Conference on Computational Intelligence and Signal Processing . 91\u201394.","DOI":"10.1109\/NCCISP.2012.6189684"},{"key":"e_1_3_1_10_2","unstructured":"S. K. Sarma R. Medhi M. Gogoi and U. Saikia. 2010. Foundation and structure of developing an assamese WordNet. In Proceedings of the 5th International Global WordNet Conference (GWC\u201910) ."},{"key":"e_1_3_1_11_2","doi-asserted-by":"crossref","unstructured":"K. Koskenniemi. 1984. A general computational model for word-form recognition and production. In Proceedings of the 10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics . 178\u2013181.","DOI":"10.3115\/980491.980529"},{"key":"e_1_3_1_12_2","unstructured":"R. Wicentowski and D. Yarowsky. 2002. Modeling and Learning Multilingual Inflectional Morphology in a Minimally Supervised Framework . Ph.D. Dissertation. Johns Hopkins University Baltimore Maryland."},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"K. Toutanova and C. Cherry. 2009. A global model for joint lemmatization and part-of-speech pre-diction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP . 486\u2013494.","DOI":"10.3115\/1687878.1687947"},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","unstructured":"A. Loponen and K. J\u00e4rvelin. 2010. A dictionary-and corpus-independent statistical lemmatizer for information retrieval in low resource languages. In Multilingual and Multimodal Information Access Evaluation Springer 3\u201314.","DOI":"10.1007\/978-3-642-15998-5_3"},{"key":"e_1_3_1_15_2","unstructured":"A. Gesmundo and T. Samard\u017ei\u0107. 2012. Lemmatisation as a tagging task. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics 2 368\u2013372. http:\/\/www.aclweb.org\/anthology\/P12-2072."},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Thomas M\u00fcller R. Cotterell A. Fraser and H. Sch\u00fctze. 2015. Joint lemmatization and morphological tagging with lemming. In Proceedings of the Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics . 2268\u20132274. http:\/\/aclweb.org\/anthology\/D15-1272.","DOI":"10.18653\/v1\/D15-1272"},{"key":"e_1_3_1_17_2","doi-asserted-by":"crossref","unstructured":"A. Chakrabarty and U. Garain. 2016. BenLem (a Bengali lemmatizer) and its role in WSD. ACM Trans. Asian Low-Resour. Lang. Inf. Process .","DOI":"10.1145\/2835494"},{"key":"e_1_3_1_18_2","doi-asserted-by":"crossref","unstructured":"N. Baruah S. K. Sarma and S. Borkakoty. 2020. Evaluation of content compaction in assamese language. Proc. Comput. Sci. 171 2275\u20132285.","DOI":"10.1016\/j.procs.2020.04.246"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIEV.2016.7760117"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICACCI.2014.6968484"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/2037661.2037664"},{"key":"e_1_3_1_22_2","unstructured":"U. Mishra and C. Meena. 2012. MAULIK: An effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4 5 (2012) 711\u2013717."},{"key":"e_1_3_1_23_2","doi-asserted-by":"crossref","unstructured":"R. J. Pratibha and M. C. Padma. 2015. Design of rule based lemmatizer for Kannada inflectional words. In Proceedings of the International Conference on Emerging Research in Electronics Computer Science and Technology (ICERECT\u201915) .","DOI":"10.1109\/ERECT.2015.7499024"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"B. Nathani and G. Purohit. 2019. Design and development of lemmatizer for Sindhi language in devanagri script. J. Stat. Manage. Syst.","DOI":"10.1080\/09720510.2019.1609187"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"H. Patel and B. Patel. 2019. Stemmatizer-stemmer-based lemmatizer for Gujarati text. In Emerging Trends in Expert Applications and Security . 667\u2013674.","DOI":"10.1007\/978-981-13-2285-3_78"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/2037661.2037664"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"N. Baruah A. Gogoi and S. K. Sarma. 2020. Utizing copus statistics for Assamese Word Sense Disambiguation. In Proceedings of the 4th International Conference on Computing and Network Communications .","DOI":"10.1007\/978-981-33-6987-0_23"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISACC.2015.7377330"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1459352.1459355"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database MIT Press.","DOI":"10.7551\/mitpress\/7287.001.0001"},{"key":"e_1_3_1_31_2","doi-asserted-by":"crossref","unstructured":"A. G. Miller M. Chodorow S. Landes C. Leacock and R. G. Thomas. 1994. Using a semantic concordance for sense identification. In Proceedings of the Workshop on Human Language Technology Association for Computational Linguistics . 240\u2013243.","DOI":"10.3115\/1075812.1075866"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3502157","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3502157","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:49Z","timestamp":1750183789000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3502157"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,17]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,7,31]]}},"alternative-id":["10.1145\/3502157"],"URL":"https:\/\/doi.org\/10.1145\/3502157","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2022,5,17]]},"assertion":[{"value":"2020-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}