{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T18:42:24Z","timestamp":1755801744176,"version":"3.44.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>\n            In multilingual societies, people tend to mix multiple languages for communication. This phenomenon is known as code-mixing or code-switching. This is visible more on social media platforms and e-commerce websites to share their opinions and feelings. Automatic language detection has become a crucial step in language processing for subsequent tasks. This study focuses on detecting the language of Kannada\u2013English code-mixed sentences at the word level. The dataset is prepared by annotation based on rules prior to the detection. The transformer-based approach is applied using BERT and its variants. The models label words as English, Kannada, Mixed, Named Entity, and Universal for the given code-mixed texts. The highest accuracy (98%) was obtained using XLM-RoBERTa\n            <jats:bold>\n              <jats:italic toggle=\"yes\">.<\/jats:italic>\n            <\/jats:bold>\n          <\/jats:p>","DOI":"10.1145\/3748310","type":"journal-article","created":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T11:08:06Z","timestamp":1752232086000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["KaEnLandetector: Rule-Based Language Annotation and Transformer-Based Language Detection for Kannada-English Code-Mixed Text"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-3002-5702","authenticated-orcid":false,"given":"Rashmi","family":"K B","sequence":"first","affiliation":[{"name":"Department of Information Science and Engineering, B.M.S. College of Engineering","place":["Bangalore, India"]},{"name":"Visvesvaraya Technological University","place":["Bangalore, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2825-7169","authenticated-orcid":false,"given":"H S","family":"Guruprasad","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, B.M.S. College of Engineering","place":["Bangalore, India"]},{"name":"Visvesvaraya Technological University","place":["Bangalore, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3389-4253","authenticated-orcid":false,"given":"Shambhavi","family":"B R","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering (DS), B.M.S. College of Engineering","place":["Bangalore, India"]},{"name":"Visvesvaraya Technological University","place":["Bangalore, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,20]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"54","volume-title":"Proceedings of the 3rd Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media","author":"Hande A.","year":"2020","unstructured":"A. Hande, R. Priyadharshini, and B. R. Chakravarthi. 2020. KanCMD: Kannada codemixed dataset for sentiment analysis and offensive language detection. In Proceedings of the 3rd Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media. 54\u201363."},{"key":"e_1_3_2_3_2","article-title":"Comparison of different orthographies for machine translation of under-resourced Dravidian languages","author":"Chakravarthi B. R.","year":"2019","unstructured":"B. R. Chakravarthi, M. Arcan, and J. P. McCrae. 2019. Comparison of different orthographies for machine translation of under-resourced Dravidian languages. In Proceedings of the 2nd Conference on Language, Data and Knowledge (LDK 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.","journal-title":"Proceedings of the 2nd Conference on Language, Data and Knowledge (LDK 2019)"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3902"},{"issue":"4","key":"e_1_3_2_5_2","first-page":"388","article-title":"A survey of language identification techniques and applications","volume":"6","author":"Garg A.","year":"2014","unstructured":"A. Garg, V. Gupta, and M. Jindal. 2014. A survey of language identification techniques and applications. Journal of Emerging Technologies in Web Intelligence 6, 4 (2014), 388\u2013400.","journal-title":"Journal of Emerging Technologies in Web Intelligence"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAE.2009.35"},{"key":"e_1_3_2_7_2","volume-title":"EACL| VarDial","author":"Chakravarthi B. R.","year":"2021","unstructured":"B. R. Chakravarthi, M. G\u0103man, R. T. Ionescu, H. Jauhiainen, T. Jauhiainen, K. Lind\u00e9n, N. Ljube\u0161i\u0107, N. Partanen, R. Priyadharshini, C. Purschke, and E. Rajagopal. 2021. Findings of the VarDial evaluation campaign 2021. In EACL| VarDial. Association for Computational Linguistics."},{"issue":"4","key":"e_1_3_2_8_2","first-page":"24","article-title":"Language identification of Kannada language using n-Gram","volume":"6","author":"Deepamala N.","year":"2012","unstructured":"N. Deepamala and P. R. Kumar. 2012. Language identification of Kannada language using n-Gram. International Journal of Computer Applications 6, 4 (2012), 24\u201328.","journal-title":"International Journal of Computer Applications"},{"key":"e_1_3_2_9_2","first-page":"38","volume-title":"Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts","author":"Balouchzahi F.","year":"2022","unstructured":"F. Balouchzahi, S. Butt, A. Hegde, N. Ashraf, H. L. Shashirekha, G. Sidorov, and A. Gelbukh. 2022. Overview of CoLI-Kanglish: Word level language identification in code-mixed Kannada-English texts at ICON 2022. In Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts. 38\u201345."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2824864.2824871"},{"key":"e_1_3_2_11_2","first-page":"012027","volume-title":"IOP Conference Series: Materials Science and Engineering","volume":"1020","author":"Kalita N. J.","year":"2021","unstructured":"N. J. Kalita, A. G. Agarwala, and J. Das. 2021. Word level language identification on code-mixed English-Bodo text. In IOP Conference Series: Materials Science and Engineering (Vol. 1020, No. 1, p. 012027). IOP Publishing."},{"key":"e_1_3_2_12_2","first-page":"1","volume-title":"Proceedings of the 2018 3rd International Conference on Information Technology Research (ICITR)","author":"Shanmugalingam K.","year":"2018","unstructured":"K. Shanmugalingam, S. Sumathipala, and C. Premachandra. 2018. Word level language identification of code mixing text in social media using NLP. In Proceedings of the 2018 3rd International Conference on Information Technology Research (ICITR). IEEE, 1\u20135."},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"857","DOI":"10.18653\/v1\/D13-1084","article-title":"Word level language identification in online multilingual communication","author":"Nguyen D.","year":"2013","unstructured":"D. Nguyen and A. S. Dogru\u00f6z. 2013. Word level language identification in online multilingual communication. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, A meeting of SIGDAT, a Special Interest Group of the ACL. Association for Computational Linguistics, 857\u2013862.","journal-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","first-page":"51","DOI":"10.18653\/v1\/W18-3206","volume-title":"Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-switching","author":"Mave D.","year":"2018","unstructured":"D. Mave, S. Maharjan, and T. Solorio. 2018. Language identification and analysis of code-switched social media text. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-switching. 51\u201361."},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","first-page":"50","DOI":"10.18653\/v1\/W16-5806","volume-title":"Proceedings of the 2nd Workshop on Computational Approaches to Code Switching","author":"Samih Y.","year":"2016","unstructured":"Y. Samih, S. Maharjan, M. Attia, L. Kallmeyer, and T. Solorio. 2016. Multilingual code-switching identification via LSTM recurrent neural networks. In Proceedings of the 2nd Workshop on Computational Approaches to Code Switching. 50\u201359."},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","first-page":"52","DOI":"10.18653\/v1\/P18-3008","volume-title":"Proceedings of the ACL 2018, Student Research Workshop","author":"Singh K.","year":"2018","unstructured":"K. Singh, I. Sen, and P. Kumaraguru. 2018. Language identification and named entity recognition in Hinglish code mixed tweets. In Proceedings of the ACL 2018, Student Research Workshop. 52\u201358."},{"key":"e_1_3_2_17_2","first-page":"248","volume-title":"Proceedings of the 2021 IEEE 20th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC)","author":"Ansari M. Z.","year":"2021","unstructured":"M. Z. Ansari, M. S. Beg, T. Ahmad, M. J. Khan, and G. Wasim. 2021. Language identification of Hindi-English tweets using code-mixed BERT. In Proceedings of the 2021 IEEE 20th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 248\u2013252."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.array.2021.100104"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Soumil Mandal and Anil Kumar Singh. 2018. Language identification in code-mixed data using multichannel neural networks and context capture. arXiv preprint arXiv:1808.07118.","DOI":"10.18653\/v1\/W18-6116"},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1109\/IALP48816.2019.9037680","volume-title":"Proceedings of the 2019 International Conference on Asian Language Processing (IALP)","author":"Smith I.","year":"2019","unstructured":"I. Smith and U. Thayasivam. 2019. Language detection in Sinhala-English code-mixed data. In Proceedings of the 2019 International Conference on Asian Language Processing (IALP). IEEE, 228\u2013233."},{"volume-title":"Proc. 11th Int. Conf. Natural Lang. Process.","key":"e_1_3_2_21_2","unstructured":"A. Das and B. Gamb\u00e4ck. 2014. Identifying languages at the word level in codemixed Indian social media text. In Proc. 11th Int. Conf. Natural Lang. Process. (2014), 378--387."},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1007\/978-981-16-3690-5_73","volume-title":"ICDSMLA 2020: Proceedings of the 2nd International Conference on Data Science, Machine Learning and Applications","author":"Joshi R.","year":"2022","unstructured":"R. Joshi and R. Joshi. 2022. Evaluating input representation for language identification in Hindi-English code mixed text. In ICDSMLA 2020: Proceedings of the 2nd International Conference on Data Science, Machine Learning and Applications. Springer Singapore, 795\u2013802."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1515\/jisys-2017-0440"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.5334\/johd.44"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1804.05095"},{"key":"e_1_3_2_26_2","first-page":"1552","volume-title":"Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)","author":"Veena P. V.","year":"2017","unstructured":"P. V. Veena, M. A. Kumar, and K. P. Soman. 2017. An effective way of word-level language identification for code-mixed Facebook comments using word-embedding via character-embedding. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 1552\u20131556."},{"key":"e_1_3_2_27_2","first-page":"1","volume-title":"Proceedings of the 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS)","author":"Lakshmi B. S.","year":"2017","unstructured":"B. S. Lakshmi and B. R. Shambhavi. 2017. An automatic language identification system for code-mixed English-Kannada social media text. In Proceedings of the 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS). IEEE, 1\u20135."},{"key":"e_1_3_2_28_2","first-page":"76","volume-title":"Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference","author":"Dutta A.","year":"2022","unstructured":"A. Dutta. 2022. Word-level language identification using subword embeddings for code-mixed Bangla-English social media data. In Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference. 76\u201382."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0217984920500864"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3104106"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2111.09811"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2014.12.004"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"volume-title":"arXiv preprint arXiv:1909.11942","key":"e_1_3_2_34_2","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)."},{"key":"e_1_3_2_35_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","unstructured":"A. Conneau K. Khandelwal N. Goyal V. Chaudhary G. Wenzek F. Guzm\u00e1n E. Grave M. Ott L. Zettlemoyer and V. Stoyanov. 2020. Unsupervised cross-lingual representation learning at scale. In ACL. 10.18653\/v1\/2020.acl-main.747","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"e_1_3_2_37_2","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: smaller faster cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)."},{"key":"e_1_3_2_38_2","unstructured":"Z. Yang Z. Dai Y. Yang J. Carbonell R. R. Salakhutdinov and Q. V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proc. Adv. Neural Inf. Process. Syst. 32 (2019)."},{"key":"e_1_3_2_39_2","unstructured":"G. Lample and A. Conneau. 2019. Cross-lingual language model pretraining. arXiv 2019 arXiv:1901.07291."},{"key":"e_1_3_2_40_2","unstructured":"Kevin Clark Minh-Thang Luong Quoc V. Le and Christopher D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. International Conference on Learning Representations (ICLR'20)."},{"key":"e_1_3_2_41_2","unstructured":"F. N. Iandola S. Han M. W. Moskewicz K. Ashraf W. J. Dally and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50\u00d7 fewer parameters and <0.5 MB model size. arXiv 2016 arXiv:1602.07360."},{"key":"e_1_3_2_42_2","unstructured":"Atnafu Lambebo Tonja Mesay Gemeda Yigezu Olga Kolesnikova Moein Shahiki Tash Grigori Sidorov and Alexander Gelbuk. 2022. Transformer-based model for word level language identification in code-mixed kannada-english texts. arXiv preprint arXiv:2211.14459."},{"key":"e_1_3_2_43_2","unstructured":"H. L. Shashirekha F. Balouchzahi M. D. Anusha and G. Sidorov. 2022. Coli-machine learning approaches for code-mixed language identification at the word level in kannada-english texts. arXiv preprint arXiv:2211.09847."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748310","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T18:08:09Z","timestamp":1755713289000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748310"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,20]]},"references-count":42,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3748310"],"URL":"https:\/\/doi.org\/10.1145\/3748310","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2025,8,20]]},"assertion":[{"value":"2023-05-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}