{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:26:32Z","timestamp":1759332392698,"version":"3.41.0"},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2008,6,1]],"date-time":"2008-06-01T00:00:00Z","timestamp":1212278400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2008,6]]},"abstract":"<jats:p>This article describes an approach to unsupervised learning of\nmorphology from an unannotated corpus for a highly inflectional\nIndo-European language called Assamese spoken by about 30 million\npeople. Although Assamese is one of Indias national languages, it\nutterly lacks computational linguistic resources. There exists no\nprior computational work on this language spoken widely in\nnortheast India. The work presented is pioneering in this respect.\nIn this article, we discuss salient issues in Assamese morphology\nwhere the presence of a large number of suffixal determiners,\nsandhi, samas, and the propensity to use suffix sequences make\napproximately 50% of the words used in written and spoken text\ninflected. We implement methods proposed by Gaussier and Goldsmith\non acquisition of morphological knowledge, and obtain F-measure\nperformance below 60%. This motivates us to present a method more\nsuitable for handling suffix sequences, enabling us to increase the\nF-measure performance of morphology acquisition to almost 70%. We\ndescribe how we build a morphological dictionary for Assamese from\nthe text corpus. Using the morphological knowledge acquired and the\nmorphological dictionary, we are able to process small chunks of\ndata at a time as well as a large corpus. We achieve approximately\n85% precision and recall during the analysis of small chunks of\ncoherent text.<\/jats:p>","DOI":"10.1145\/1386869.1386871","type":"journal-article","created":{"date-parts":[[2008,8,27]],"date-time":"2008-08-27T11:56:36Z","timestamp":1219838196000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Acquisition of Morphology of an Indic Language from Text Corpus"],"prefix":"10.1145","volume":"7","author":[{"given":"Utpal","family":"Sharma","sequence":"first","affiliation":[{"name":"Tezpur University"}]},{"given":"Jugal K.","family":"Kalita","sequence":"additional","affiliation":[{"name":"University of Colorado"}]},{"given":"Rajib K.","family":"Das","sequence":"additional","affiliation":[{"name":"Calcutta University"}]}],"member":"320","published-online":{"date-parts":[[2008,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Bora S. 1968. bahal byaakaran. Jnananath Bora Guwahati. Bora S. 1968. bahal byaakaran . Jnananath Bora Guwahati."},{"volume-title":"The Handbook of Morphology, Spencer","author":"Borer H.","key":"e_1_2_1_2_1","unstructured":"Borer , H. 1998. Morphology and syntax . In The Handbook of Morphology, Spencer , A. and Zwicky, A. M. eds., 151--190, Blackwell Publishers Ltd . Borer, H. 1998. Morphology and syntax. In The Handbook of Morphology, Spencer, A. and Zwicky, A. M. eds., 151--190, Blackwell Publishers Ltd."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1018442.1022066"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075096.1075132"},{"volume-title":"Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON\u201904)","author":"Creutz M.","key":"e_1_2_1_5_1","unstructured":"Creutz , M. and Lagus , K . 2004. Induction of a simple morphology for highly-inflecting languages . In Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON\u201904) , 43--51. Creutz, M. and Lagus, K. 2004. Induction of a simple morphology for highly-inflecting languages. In Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON\u201904), 43--51."},{"volume-title":"Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR\u201905)","author":"Creutz M.","key":"e_1_2_1_6_1","unstructured":"Creutz , M. and Lagus , K . 2005. Inducing the morphological lexicon of a natural language from unannotated text . In Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR\u201905) , 106--113. Creutz, M. and Lagus, K. 2005. Inducing the morphological lexicon of a natural language from unannotated text. In Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR\u201905), 106--113."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/648341.756230"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the Unsupervised Learning in Natural Language Processing Workshop (ACL\u201999)","author":"Gaussier E.","year":"1999","unstructured":"Gaussier , E. 1999 . Unsupervised learning of derivational morphology from inflectional lexicons . In Proceedings of the Unsupervised Learning in Natural Language Processing Workshop (ACL\u201999) . ACL, 24--30. Gaussier, E. 1999. Unsupervised learning of derivational morphology from inflectional lexicons. In Proceedings of the Unsupervised Learning in Natural Language Processing Workshop (ACL\u201999). ACL, 24--30."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.3115\/981732.981771"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120101750300490"},{"volume-title":"asamiyaa byaakaranar moulik bisaar","author":"Goswami G.","key":"e_1_2_1_11_1","unstructured":"Goswami , G. 1990. asamiyaa byaakaranar moulik bisaar . Bina Library , Guwahati, India . Goswami, G. 1990. asamiyaa byaakaranar moulik bisaar. Bina Library, Guwahati, India."},{"volume-title":"Deconstructing morphology: Word formation in syntactic theory","author":"Leiber R.","key":"e_1_2_1_12_1","unstructured":"Leiber , R. 1992. Deconstructing morphology: Word formation in syntactic theory . University of Chicago Press , Chicago, IL . Leiber, R. 1992. Deconstructing morphology: Word formation in syntactic theory. University of Chicago Press, Chicago, IL."},{"key":"e_1_2_1_13_1","volume-title":"Assamese Grammar and Origin of the Assamese Language","author":"Medhi K.","unstructured":"Medhi , K. 1999. Assamese Grammar and Origin of the Assamese Language . 3 rd Ed. Lawyer\u2019s Book Stall , Guwahati, India . Medhi, K. 1999. Assamese Grammar and Origin of the Assamese Language. 3rd Ed. Lawyer\u2019s Book Stall, Guwahati, India.","edition":"3"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"volume-title":"Proceedings of the International Conference on Natural Language Processing (ICON\u201902)","author":"Saravanan M.","key":"e_1_2_1_15_1","unstructured":"Saravanan , M. , Reghv Raj , P. C. , Murty , V. S. , and Raman , S . 2002. Improved porter\u2019s algorithm for root word stemming . In Proceedings of the International Conference on Natural Language Processing (ICON\u201902) , 21--30. Saravanan, M., Reghv Raj, P. C., Murty, V. S., and Raman, S. 2002. Improved porter\u2019s algorithm for root word stemming. In Proceedings of the International Conference on Natural Language Processing (ICON\u201902), 21--30."},{"volume-title":"sahaj byaakaran","author":"Sarma D. D.","key":"e_1_2_1_16_1","unstructured":"Sarma , D. D. 1977. sahaj byaakaran . Assam State Textbook Production and Publication Corporation Ltd ., Guwahati-1, India. Sarma, D. D. 1977. sahaj byaakaran. Assam State Textbook Production and Publication Corporation Ltd., Guwahati-1, India."},{"volume-title":"An introduction to government and binding","author":"Schneider G.","key":"e_1_2_1_17_1","unstructured":"Schneider , G. 1998. An introduction to government and binding . University of Zurich. http :\/\/www.ifi.unizh.ch\/CL\/gschneid\/dreitaegig.ps.gz. Schneider, G. 1998. An introduction to government and binding. University of Zurich. http:\/\/www.ifi.unizh.ch\/CL\/gschneid\/dreitaegig.ps.gz."},{"volume-title":"Proceedings of the National Workshop on Trends in Advanced Computing (NWTAC\u201906)","author":"Sharma U.","key":"e_1_2_1_19_1","unstructured":"Sharma , U. , Das , R. , and Kalita , J . 2006. Unsupervised acquisition of morphological features of Assamese from a text corpus . In Proceedings of the National Workshop on Trends in Advanced Computing (NWTAC\u201906) , 178--184. Sharma, U., Das, R., and Kalita, J. 2006. Unsupervised acquisition of morphological features of Assamese from a text corpus. In Proceedings of the National Workshop on Trends in Advanced Computing (NWTAC\u201906), 178--184."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118647.1118648"},{"volume-title":"Proceedings of the 6th International Conference on Computational Intelligence and Natural Computing (CINC\u201903)","author":"Sharma U.","key":"e_1_2_1_21_1","unstructured":"Sharma , U. , Kalita , J. , and Das , R . 2003. Root word stemming by multiple evidence from corpus . In Proceedings of the 6th International Conference on Computational Intelligence and Natural Computing (CINC\u201903) . Sharma, U., Kalita, J., and Das, R. 2003. Root word stemming by multiple evidence from corpus. In Proceedings of the 6th International Conference on Computational Intelligence and Natural Computing (CINC\u201903)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118647.1118649"},{"volume-title":"The ashtadhyayi of panini (edited and translated into English)","author":"Vasu S. C.","key":"e_1_2_1_23_1","unstructured":"Vasu , S. C. 1891. The ashtadhyayi of panini (edited and translated into English) , vol. I . Motilal Banarsidass , Delhi, India . Vasu, S. C. 1891. The ashtadhyayi of panini (edited and translated into English), vol. I. Motilal Banarsidass, Delhi, India."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1386869.1386871","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1386869.1386871","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:57:48Z","timestamp":1750255068000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1386869.1386871"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,6]]},"references-count":22,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2008,6]]}},"alternative-id":["10.1145\/1386869.1386871"],"URL":"https:\/\/doi.org\/10.1145\/1386869.1386871","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2008,6]]},"assertion":[{"value":"2007-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}