{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,5]],"date-time":"2025-07-05T04:47:44Z","timestamp":1751690864782,"version":"3.41.0"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T00:00:00Z","timestamp":1652745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>Reduplication is a productive morphological process widely used in a substantial number of languages in the world. Reduplication is a well-studied phenomenon, and several typological works have provided evidence for different types of reduplication in most of the languages around the world. Addressing reduplication plays a vital role in the efficiency of POS tagger, sentiment analysis, as well as other NLP tasks. However, it is an understudied area in computational linguistics, especially in low-resource languages like Assamese. This article first describes different types of reduplication and their shapes that occur in Assamese. Second, an exhaustive set of reduplication formation rules is compiled that is incorporated to build a system to identify reduplication in Assamese text. The results of the experiments performed on three different domain datasets showed that the rule-based system can identify reduplicated expressions with an average precision, recall, and F1 scores of 94.19%, 98.07%, and 96.07%, respectively. Third, it is shown that the Assamese reduplication processes can be captured through a two-way finite-state transducer (2-way FST). Finally, two broad categories of reduplicative processes along with their corresponding 2-way FST model are presented.<\/jats:p>","DOI":"10.1145\/3510419","type":"journal-article","created":{"date-parts":[[2022,2,3]],"date-time":"2022-02-03T17:54:50Z","timestamp":1643910890000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Reduplication in Assamese: Identification and Modeling"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8094-3620","authenticated-orcid":false,"given":"Dhrubajyoti","family":"Pathak","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Guwahati, Guwahati, India"}]},{"given":"Sukumar","family":"Nandi","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Guwahati, Guwahati, India"}]},{"given":"Priyankoo","family":"Sarmah","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Guwahati, Guwahati, India"}]}],"member":"320","published-online":{"date-parts":[[2022,5,17]]},"reference":[{"issue":"2","key":"e_1_3_1_2_1","first-page":"171","article-title":"Reduplication in Tibeto Burman languages of south Asia","volume":"28","author":"Abbi Anvita","year":"1990","unstructured":"Anvita Abbi. 1990. Reduplication in Tibeto Burman languages of south Asia. Japan. J. South. Asian Stud. 28, 2 (1990), 171\u2013181.","journal-title":"Japan. J. South. Asian Stud."},{"key":"e_1_3_1_3_1","volume-title":"Reduplication in South Asian Languages: An Areal, Typological, and Historical Study","author":"Abbi Anvita","year":"1992","unstructured":"Anvita Abbi. 1992. Reduplication in South Asian Languages: An Areal, Typological, and Historical Study. Allied Publishers Pvt. Ltd, India."},{"key":"e_1_3_1_4_1","volume-title":"Introducing Linguistic Morphology","author":"Bauer Laurie","year":"1988","unstructured":"Laurie Bauer. 1988. Introducing Linguistic Morphology, Vol. 57. Edinburgh University Press Edinburgh."},{"key":"e_1_3_1_5_1","article-title":"Finite-state morphology: Xerox tools and techniques","author":"Beesley Kenneth R.","year":"2003","unstructured":"Kenneth R. Beesley and Lauri Karttunen. 2003. Finite-state morphology: Xerox tools and techniques. CSLI, Stanford (2003).","journal-title":"CSLI, Stanford"},{"volume-title":"Abstract of Speakers\u2019 Strength of Languages and Mother Tongues - 2011","year":"2020","key":"e_1_3_1_6_1","unstructured":"Census. 2020. Abstract of Speakers\u2019 Strength of Languages and Mother Tongues - 2011. Retrieved from http:\/\/censusindia.gov.in\/2011Census\/C-16_25062018_NEW.pdf."},{"key":"e_1_3_1_7_1","first-page":"73","volume-title":"Proceedings of the Workshop on Multiword Expressions: From Theory to Applications","author":"Chakraborty Tanmoy","year":"2010","unstructured":"Tanmoy Chakraborty and Sivaji Bandyopadhyay. 2010. Identification of reduplication in Bengali corpus and their semantic analysis: A rule based approach. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications. 73\u201376."},{"key":"e_1_3_1_8_1","doi-asserted-by":"publisher","DOI":"10.1162\/LING_a_00265"},{"key":"e_1_3_1_9_1","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2006.32.1.49"},{"key":"e_1_3_1_10_1","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1007\/978-94-009-3401-6_14","volume-title":"The Formal Complexity of Natural Language","author":"Culy Christopher","year":"1985","unstructured":"Christopher Culy. 1985. The complexity of the vocabulary of Bambara. In The Formal Complexity of Natural Language. Springer, 349\u2013357."},{"key":"e_1_3_1_11_1","unstructured":"Satarupa Dattamajumdar. 1999. A Contrastive Study of the Reduplicated Structures in Asamiya Bangla and Odia . Ph.D. Dissertation. Department of Linguistics University of Calcutta Kolkata West Bengal."},{"key":"e_1_3_1_12_1","first-page":"55\u2013 69","article-title":"Reduplication with finite-state technology","volume":"53","author":"Dolatian Hossep","year":"2017","unstructured":"Hossep Dolatian and Jeffrey Heinz. 2017. Reduplication with finite-state technology. Proc. CLS 53 (2017), 55\u2013 69.","journal-title":"Proc. CLS"},{"key":"e_1_3_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5807"},{"issue":"1","key":"e_1_3_1_14_1","first-page":"8","article-title":"RedTyp: A database of reduplication with computational models","volume":"2","author":"Dolatian Hossep","year":"2019","unstructured":"Hossep Dolatian and Jeffrey Heinz. 2019. RedTyp: A database of reduplication with computational models. Proc. Soc. Comput. Ling. 2, 1 (2019), 8\u201318.","journal-title":"Proc. Soc. Comput. Ling."},{"key":"e_1_3_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/371316.371512"},{"key":"e_1_3_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2984450.2984453"},{"key":"e_1_3_1_17_1","volume-title":"Structure of Assamese","author":"Goswami G. C.","year":"1982","unstructured":"G. C. Goswami. 1982. Structure of Assamese. Guwahati University, Guwahati."},{"key":"e_1_3_1_18_1","volume-title":"Fundamentals of Assamese Grammar (, 11th Edition) (Reprint, 2017)","author":"Goswami G. C.","year":"1987","unstructured":"G. C. Goswami. 1987. Fundamentals of Assamese Grammar (, 11th Edition) (Reprint, 2017). Bina Library, Panbazar, Guwahati."},{"key":"e_1_3_1_19_1","volume-title":"An Introduction to Assamese","author":"Goswami U.","year":"1978","unstructured":"U. Goswami. 1978. An Introduction to Assamese. Mani Manik Prakash, Panbazar, Guwahati."},{"key":"e_1_3_1_20_1","volume-title":"Asamiya Bhashar Vyakarana","author":"Goswami U.","year":"1981","unstructured":"U. Goswami. 1981. Asamiya Bhashar Vyakarana, (10th ed). 2011. Mani Manik Prakash, Panbazar, Guwahati."},{"key":"e_1_3_1_21_1","unstructured":"John E. Hopcroft and Jeffrey D. Ullman. 1969. Formal languages and their relation to automata. Addison-Wesley Longman Publishing Co. Inc."},{"key":"e_1_3_1_22_1","first-page":"207","volume-title":"Proceedings of the Conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop","author":"Hulden Mans","year":"2009","unstructured":"Mans Hulden and Shannon T. Bischoff. 2009. A simple formalism for capturing reduplication in finite-state morphology. In Proceedings of the Conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop. 207\u2013214."},{"key":"e_1_3_1_23_1","doi-asserted-by":"publisher","DOI":"10.1515\/9783110911466"},{"key":"e_1_3_1_24_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511627712"},{"key":"e_1_3_1_25_1","volume-title":"Indic NLP Library","author":"Kunchukuttan Anoop","year":"2019","unstructured":"Anoop Kunchukuttan. 2019.Indic NLP Library. Retrieved from https:\/\/github.com\/anoopkunchukuttan\/indic_nlp_resources.git."},{"key":"e_1_3_1_26_1","first-page":"707","volume-title":"Sov. Phys. Doklady","author":"Levenshtein Vladimir I.","year":"1966","unstructured":"Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Sov. Phys. Doklady, Vol. 10. 707\u2013710."},{"key":"e_1_3_1_27_1","unstructured":"John J. McCarthy and Alan S. Prince. 1995. Faithfulness and reduplicative identity. Linguistics Department Faculty Publication Series (1995) 10."},{"key":"e_1_3_1_28_1","unstructured":"Kishorjit Nongmeikapam and Sivaji Bandyopadhyay. 2011. Identification of reduplicated MWEs in Manipuri: A rule based approach. In Proceedings of 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL\u201910) . 49\u201354."},{"key":"e_1_3_1_29_1","article-title":"Assamese reduplication identification system","author":"Pathak D.","year":"2020","unstructured":"D. Pathak. 2020. Assamese reduplication identification system. Retrieved from https:\/\/github.com\/anononymus\/assamese-redup.","journal-title":"Retrieved from https:\/\/github.com\/anononymus\/assamese-redup"},{"key":"e_1_3_1_30_1","volume-title":"Computational Approaches to Morphology and Syntax","author":"Roark Brian","year":"2007","unstructured":"Brian Roark and Richard William Sproat. 2007. Computational Approaches to Morphology and Syntax, Vol. 4. Oxford University Press."},{"key":"e_1_3_1_31_1","volume-title":"Studies on Reduplication","author":"Rubino Carl","year":"2005","unstructured":"Carl Rubino. 2005. Reduplication: Form, function and distribution. Studies on Reduplication 28 (2005), 11\u201329."},{"key":"e_1_3_1_32_1","volume-title":"The World Atlas of Language Structures Online","author":"Rubino Carl","year":"2013","unstructured":"Carl Rubino. 2013. Reduplication. In The World Atlas of Language Structures Online, Matthew S. Dryer and Martin Haspelmath (Eds.). Max Planck Institute for Evolutionary Anthropology, Leipzig. Retrieved from https:\/\/wals.info\/chapter\/27."},{"key":"e_1_3_1_33_1","volume-title":"Assamese Grammar and Usage: An Analytical Studies of Assamese Grammar and Usage","author":"Saikia Bora L.","year":"2016","unstructured":"L. Saikia Bora. 2016. Assamese Grammar and Usage: An Analytical Studies of Assamese Grammar and Usage. Chandra Prakash, Guwahati, Panbazar, Guwahati."},{"issue":"28","key":"e_1_3_1_34_1","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1515\/9783110911466.263","article-title":"Reduplication in modern Hindi and the theory of reduplication","author":"Singh Rajendra","year":"2005","unstructured":"Rajendra Singh. 2005. Reduplication in modern Hindi and the theory of reduplication. Stud. Redup.28 (2005), 263.","journal-title":"Stud. Redup."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510419","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3510419","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:45Z","timestamp":1750183785000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510419"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,17]]},"references-count":33,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3510419"],"URL":"https:\/\/doi.org\/10.1145\/3510419","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2022,5,17]]},"assertion":[{"value":"2020-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}