{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T17:11:20Z","timestamp":1774631480310,"version":"3.50.1"},"reference-count":82,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,3,29]],"date-time":"2020-03-29T00:00:00Z","timestamp":1585440000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,7,31]]},"abstract":"<jats:p>\n            Social media data has become invaluable component of business analytics. A multitude of nuances of social media text make the job of conventional text analytical tools difficult. Code-mixing of text is a phenomenon prevalent among social media users, wherein words used are borrowed from multiple languages, though written in the commonly understood roman script. All the existing supervised learning methods for tasks such as Parts Of Speech (POS) tagging for code-mixed social media (CMSM) text typically depend on a large amount of training data. Preparation of such large training data is resource-intensive, requiring expertise in multiple languages. Though the preparation of small dataset is possible, the out of vocabulary (OOV) words pose major difficulty, while learning models from CMSM text as the number of different ways of writing non-native words in roman script is huge. POS tagging for code-mixed text is non-trivial, as tagging should deal with syntactic rules of multiple languages. The important research question addressed by this article is whether abundantly available unlabeled data can help in resolving the difficulties posed by code-mixed text for POS tagging. We develop an approach for scraping and building word embeddings for code-mixed text illustrating it for\n            <jats:italic>Bengali-English, Hindi-English,<\/jats:italic>\n            and\n            <jats:italic>Telugu-English<\/jats:italic>\n            code-mixing scenarios. We used a hierarchical deep recurrent neural network with linear-chain CRF layer on top of it to improve the performance of POS tagging in CMSM text by capturing contextual word features and character-sequence\u2013based information. We prepared a labeled resource for POS tagging of CMSM text by correcting 19% of labels from an existing resource. A detailed analysis of the performance of our approach with varying levels of code-mixing is provided. The results indicate that the F1-score of our approach with custom embeddings is better than the CRF-based baseline by 5.81%, 5.69%, and 6.3% in\n            <jats:italic>Bengali, Hindi<\/jats:italic>\n            , and\n            <jats:italic>Telugu<\/jats:italic>\n            languages, respectively.\n          <\/jats:p>","DOI":"10.1145\/3380967","type":"journal-article","created":{"date-parts":[[2020,3,29]],"date-time":"2020-03-29T09:42:37Z","timestamp":1585474957000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Improving Code-mixed POS Tagging Using Code-mixed Embeddings"],"prefix":"10.1145","volume":"19","author":[{"given":"S. Nagesh","family":"Bhattu","sequence":"first","affiliation":[{"name":"National Institute of Technology Andhra Pradesh, Andhra Pradesh, India"}]},{"given":"Satya Krishna","family":"Nunna","sequence":"additional","affiliation":[{"name":"IDRBT and National Institute of Technology, Andhra Pradesh, India"}]},{"given":"D. V. L. N.","family":"Somayajulu","sequence":"additional","affiliation":[{"name":"National Institute of Technology and IIITDMKL, Warangal, Andhra Pradesh, India"}]},{"given":"Binay","family":"Pradhan","sequence":"additional","affiliation":[{"name":"International Institute of Information Technology, Odisha, India"}]}],"member":"320","published-online":{"date-parts":[[2020,3,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-3222"},{"key":"e_1_2_1_2_1","first-page":"6","article-title":"On constituent chunking for","volume":"54","author":"Aslan Ozkan","year":"2018","journal-title":"Turkish. Inf. Proc. Manag."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 154--160","author":"Athavale Vinayak","year":"2016"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC\u201906)","author":"Atserias J."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2017.12.004"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3914"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3902"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/176313.176316"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3908"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00104"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the International Conference on Machine Learning. 2067--2075","author":"Chung Junyoung","year":"2015"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390177"},{"key":"e_1_2_1_14_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2016.06.001"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2015.09.008"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Gamb\u00e4ck Bj\u00f6rn","year":"2016"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-5811"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 49th Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers -","volume":"2","author":"Gimpel Kevin","year":"2002"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2010.08.008"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 249--258","author":"Gupta Deepak","year":"2016"},{"key":"e_1_2_1_22_1","volume-title":"SMPOST: Parts of speech tagger for code-mixed Indic social media text. arXiv preprint arXiv:1702.00167","author":"Gupta Deepak","year":"2017"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609622"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1080\/01434632.1992.9994482"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.3115\/981623.981638"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1228"},{"key":"e_1_2_1_28_1","first-page":"16","volume-title":"Proceedings of the 4th International Workshop on Natural Language Processing for Social Media (SocialNLP@EMNLP\u201916)","author":"Jaech Aaron","year":"1865"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the International Conference on Recent Advances in Natural Language Processing. INCOMA Ltd., 239--248","author":"Jamatia Anupam","year":"2015"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, 2482--2491","author":"Joshi Aditya","year":"2016"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-3401"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC\u201902)","author":"Kawahara Daisuke","year":"2002"},{"key":"e_1_2_1_33_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2014"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 81--89","author":"Kumar Subham","year":"2016"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 18th International Conference on Machine Learning (ICML\u201901)","author":"Lafferty John D."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1030"},{"key":"e_1_2_1_37_1","first-page":"12","volume-title":"Proceedings of the International Conference on Computational Linguistics (COLING\u201912)","author":"Li Ying","year":"2012"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2012.05.006"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1104"},{"key":"e_1_2_1_40_1","volume-title":"CEDR: Contextualized embeddings for document ranking. arXiv preprint arXiv:1904.07094","author":"MacAvaney Sean","year":"2019"},{"key":"e_1_2_1_41_1","first-page":"2","article-title":"Building a large annotated corpus of English","volume":"19","author":"Marcus Mitchell P.","year":"1993","journal-title":"The Penn Treebank. Comput. Linguist."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219852"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems. 3111--3119","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_1_44_1","first-page":"10","volume-title":"Proc. 18","author":"Murthy Rudra","year":"2018"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/1557769.1557832"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.3115\/1698381.1698416"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220365"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--390","author":"Owoputi Olutobi"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.physa.2016.01.015"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-1609"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912)","author":"Petrov Slav","year":"2012"},{"key":"e_1_2_1_53_1","volume-title":"Pimpale and Raj Nath Patel","author":"Prakash","year":"2016"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1143"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1344"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.5555\/108235.108253"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2740908.2743006"},{"key":"e_1_2_1_58_1","volume-title":"Rao and Sobha Lalitha Devi","author":"Pattabhi R.","year":"2016"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/1596374.1596399"},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing.","author":"Ratnaparkhi Adwait","year":"1996"},{"key":"e_1_2_1_61_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1524--1534","author":"Ritter Alan","year":"2011"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2017.07.003"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1121"},{"key":"e_1_2_1_64_1","volume-title":"Part-of-speech tagging for code-mixed Indian social media text at ICON","author":"Sarkar Kamal","year":"2015"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_2_1_66_1","volume-title":"Neural networks for speech processing. Encyclopedia of Electrical and Electronic Engineering","author":"Schuster Mike"},{"key":"e_1_2_1_67_1","volume-title":"Proceedings of the 12th International Conference on Natural Language Processing. NLP Association of India, 237--246","author":"Sequiera Royal","year":"2015"},{"key":"e_1_2_1_68_1","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1340--1345","author":"Sharma Arnav"},{"key":"e_1_2_1_69_1","volume-title":"Automatic normalization of word variations in code-mixed social media text. arXiv preprint arXiv:1804.00804","author":"Singh Rajat","year":"2018"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3907"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.5555\/1613715.1613841"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.5555\/1613715.1613852"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3158354.3158357"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1527"},{"key":"e_1_2_1_75_1","volume-title":"Proceedings of the 28th International Conference on Machine Learning (ICML\u201911)","author":"Sutskever Ilya"},{"key":"e_1_2_1_76_1","first-page":"1453","article-title":"Large margin methods for structured and interdependent output variables","author":"Tsochantaridis Ioannis","year":"2005","journal-title":"J. Mach. Learn. Res. 6"},{"key":"e_1_2_1_77_1","first-page":"65","article-title":"Character embedding for language identification in Hindi-English code-mixed social media text","volume":"22","author":"Veena P. V.","year":"2018","journal-title":"Comput. Sistemas"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1105"},{"key":"e_1_2_1_79_1","volume-title":"Yu","author":"Xu Hu","year":"2019"},{"key":"e_1_2_1_80_1","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 72--77","author":"Yang Wei","year":"2019"},{"key":"e_1_2_1_81_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 647--657","author":"Zheng Xiaoqing","year":"2013"},{"key":"e_1_2_1_82_1","volume-title":"SDNet: Contextualized attention-based deep network for conversational question answering. arXiv preprint arXiv:1812.03593","author":"Zhu Chenguang","year":"2018"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3380967","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3380967","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:32:46Z","timestamp":1750199566000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3380967"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,29]]},"references-count":82,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,7,31]]}},"alternative-id":["10.1145\/3380967"],"URL":"https:\/\/doi.org\/10.1145\/3380967","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,29]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-03-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}