{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:11:04Z","timestamp":1777734664700,"version":"3.51.4"},"reference-count":83,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,2,4]],"date-time":"2022-02-04T00:00:00Z","timestamp":1643932800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,2,4]],"date-time":"2022-02-04T00:00:00Z","timestamp":1643932800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"crossref","award":["SFI\/12\/RC\/2289_P2 (Insight_2)"],"award-info":[{"award-number":["SFI\/12\/RC\/2289_P2 (Insight_2)"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100002081","name":"Irish Research Council","doi-asserted-by":"publisher","award":["IRCLA\/2017\/129"],"award-info":[{"award-number":["IRCLA\/2017\/129"]}],"id":[{"id":"10.13039\/501100002081","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"crossref","award":["SFI\/12\/RC\/2289_P2 (Insight_2)"],"award-info":[{"award-number":["SFI\/12\/RC\/2289_P2 (Insight_2)"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"crossref","award":["SFI\/12\/RC\/2289_P2 (Insight_2)"],"award-info":[{"award-number":["SFI\/12\/RC\/2289_P2 (Insight_2)"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National University Ireland, Galway"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Lang Resources &amp; Evaluation"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper describes the development of a multilingual, manually annotated dataset for three under-resourced Dravidian languages generated from social media comments. The dataset was annotated for sentiment analysis and offensive language identification for a total of more than 60,000 YouTube comments. The dataset consists of around 44,000 comments in Tamil-English, around 7000 comments in Kannada-English, and around 20,000 comments in Malayalam-English. The data was manually annotated by volunteer annotators and has a high inter-annotator agreement in Krippendorff\u2019s alpha. The dataset contains all types of code-mixing phenomena since it comprises user-generated content from a multilingual country. We also present baseline experiments to establish benchmarks on the dataset using machine learning and deep learning methods. The dataset is available on Github and Zenodo.<\/jats:p>","DOI":"10.1007\/s10579-022-09583-7","type":"journal-article","created":{"date-parts":[[2022,2,4]],"date-time":"2022-02-04T20:02:36Z","timestamp":1644004956000},"page":"765-806","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":87,"title":["DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text"],"prefix":"10.1007","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4575-7934","authenticated-orcid":false,"given":"Bharathi Raja","family":"Chakravarthi","sequence":"first","affiliation":[]},{"given":"Ruba","family":"Priyadharshini","sequence":"additional","affiliation":[]},{"given":"Vigneshwaran","family":"Muralidaran","sequence":"additional","affiliation":[]},{"given":"Navya","family":"Jose","sequence":"additional","affiliation":[]},{"given":"Shardul","family":"Suryawanshi","sequence":"additional","affiliation":[]},{"given":"Elizabeth","family":"Sherly","sequence":"additional","affiliation":[]},{"given":"John P.","family":"McCrae","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,2,4]]},"reference":[{"key":"9583_CR1","unstructured":"Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis of twitter data. In: Proceedings of the workshop on language in social media (LSM 2011) (pp. 30\u201338). Portland, Oregon: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/W11-0705"},{"key":"9583_CR2","unstructured":"Agrawal, R., Chenthil\u00a0Kumar, V., Muralidharan, V., & Sharma, D. (2018). No more beating about the bush: A step towards idiom handling for Indian language NLP. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC-2018). Miyazaki, Japan: European Languages Resources Association (ELRA). https:\/\/www.aclweb.org\/anthology\/L18-1048"},{"key":"9583_CR83","unstructured":"Andronov, M.S. (1970). Dravidian languages (p. 190). Nauka Publishing House, Central Department of Oriental Literature."},{"key":"9583_CR3","doi-asserted-by":"publisher","unstructured":"Bali, K., Sharma, J., Choudhury, M., & Vyas, Y. (2014). \u201cI am borrowing ya mixing ?\u201d an analysis of English-Hindi code mixing in Facebook. In: Proceedings of the first workshop on computational approaches to code switching (pp. 116\u2013126). Doha, Qatar: Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/W14-3914, https:\/\/www.aclweb.org\/anthology\/W14-3914","DOI":"10.3115\/v1\/W14-3914"},{"key":"9583_CR4","doi-asserted-by":"publisher","unstructured":"Barman, U., Das, A., Wagner, J., & Foster, J. (2014). Code mixing: A challenge for language identification in the language of social media. In: Proceedings of the first workshop on computational approaches to code switching (pp. 13\u201323). Doha, Qatar: Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/W14-3902, https:\/\/www.aclweb.org\/anthology\/W14-3902","DOI":"10.3115\/v1\/W14-3902"},{"key":"9583_CR5","volume-title":"Print, Folklore, and Nationalism in Colonial South India","author":"SH Blackburn","year":"2006","unstructured":"Blackburn, S. H. (2006). Print, Folklore, and Nationalism in Colonial South India. New Delhi: Orient Blackswan."},{"issue":"1","key":"9583_CR6","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5\u201332.","journal-title":"Machine Learning"},{"key":"9583_CR7","doi-asserted-by":"crossref","unstructured":"Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on Machine learning (pp. 161\u2013168)","DOI":"10.1145\/1143844.1143865"},{"key":"9583_CR8","unstructured":"Chakravarthi, B. R. (2020). HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion. In: Proceedings of the third workshop on computational modeling of people\u2019s opinions, personality, and emotion\u2019s in social media pp 41\u201353. Association for Computational Linguistics, Barcelona, Spain (Online). https:\/\/www.aclweb.org\/anthology\/2020.peoples-1.5"},{"key":"9583_CR9","unstructured":"Chakravarthi, B. R., & Muralidaran, V. (2021). Findings of the shared task on hope speech detection for equality, diversity, and inclusion. In: Proceedings of the first workshop on language technology for equality, diversity and inclusion (pp. 61\u201372). Kyiv: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/2021.ltedi-1.8"},{"key":"9583_CR10","unstructured":"Chakravarthi, B. R., Anand\u00a0Kumar, M., McCrae, J. P., Premjith, B., Soman, K., & Mandl, T. (2020a). Overview of the track on HASOC-offensive Language Identification-DravidianCodeMix. In: Working notes of the forum for information retrieval evaluation (FIRE 2020). CEUR Workshop Proceedings, CEUR-WS. org"},{"key":"9583_CR11","unstructured":"Chakravarthi, B. R., Jose, N., Suryawanshi, S., Sherly, E., & McCrae, J. P. (2020b). A sentiment analysis dataset for code-mixed Malayalam-English. In: Proceedings of the 1st joint workshop of SLTU (spoken language UDTechnologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) (SLTU-CCURL 2020). Marseille, France: European Language Resources Association (ELRA)."},{"key":"9583_CR12","unstructured":"Chakravarthi, B. R., Muralidaran, V., Priyadharshini, R., & McCrae, J. P. (2020c). Corpus creation for sentiment analysis in code-mixed Tamil-English text. In: Proceedings of the 1st joint workshop of SLTU (spoken language technologies for under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) (SLTU-CCURL 2020). Marseille, France: European Language Resources Association (ELRA)"},{"key":"9583_CR13","doi-asserted-by":"publisher","unstructured":"Chakravarthi, B. R., Priyadharshini, R., Muralidaran, V., Suryawanshi, S., Jose, N., Sherly, E., & McCrae, J. P. (2020). Overview of the Track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text. In Forum for information retrieval evaluation (pp. 21\u201324). New York, NY, USA, FIRE: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3441501.3441515","DOI":"10.1145\/3441501.3441515"},{"key":"9583_CR14","unstructured":"Chakravarthi, B. R., Priyadharshini, R., Jose, N., Kumar,\u00a0M. A., Mandl, T., Kumaresan, P. K., Ponnusamy, R., R\u00a0L H, McCrae, J. P., & Sherly, E. (2021). Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada. In: Proceedings of the first workshop on speech and language technologies for dravidian languages (pp. 133\u2013145). Kyiv: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/2021.dravidianlangtech-1.17"},{"key":"9583_CR15","doi-asserted-by":"publisher","unstructured":"Chanda, A., Das, D., & Mazumdar, C. (2016). Unraveling the English-Bengali code-mixing phenomenon. In: Proceedings of the second workshop on computational approaches to code switching (pp. 80\u201389). Austin, TX: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W16-5810, https:\/\/www.aclweb.org\/anthology\/W16-5810","DOI":"10.18653\/v1\/W16-5810"},{"key":"9583_CR16","doi-asserted-by":"publisher","unstructured":"Cieliebak, M., Deriu, J. M., Egger, D., & Uzdilli, F. (2017). A twitter corpus and benchmark resources for German sentiment analysis. In: Proceedings of the fifth international workshop on natural language processing for social media (pp. 45\u201351). Valencia, Spain: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W17-1106, https:\/\/www.aclweb.org\/anthology\/W17-1106","DOI":"10.18653\/v1\/W17-1106"},{"key":"9583_CR17","doi-asserted-by":"publisher","unstructured":"Clarke, I., & Grieve, J. (2017). Dimensions of abusive language on twitter. In: Proceedings of the first workshop on abusive language online (pp. 1\u201310). Vancouver, BC, Canada: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W17-3001, https:\/\/www.aclweb.org\/anthology\/W17-3001","DOI":"10.18653\/v1\/W17-3001"},{"key":"9583_CR18","doi-asserted-by":"publisher","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 8440\u20138451). Association for Computational Linguistics, Online. https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.747, https:\/\/www.aclweb.org\/anthology\/2020.acl-main.747","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"9583_CR19","doi-asserted-by":"publisher","unstructured":"Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers) (pp. 4171\u20134186). Minneapolis, Minnesota: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/N19-1423, https:\/\/www.aclweb.org\/anthology\/N19-1423","DOI":"10.18653\/v1\/N19-1423"},{"key":"9583_CR20","doi-asserted-by":"crossref","unstructured":"Ekbal, A., & Bandyopadhyay, S. (2008). Bengali named entity recognition using support vector machine. NER for South and South East Asian Languages (p. 51)","DOI":"10.1109\/ICAPR.2009.86"},{"key":"9583_CR21","doi-asserted-by":"crossref","unstructured":"El\u00a0Boukkouri, H., Ferret, O., Lavergne, T., Noji, H., Zweigenbaum, P., & Tsujii, J. (2020). CharacterBERT: Reconciling ELMo and BERT for word-level open-vocabulary representations from characters. In: Proceedings of the 28th international conference on computational linguistics (pp. 6903\u20136915). Barcelona, Spain (Online): International Committee on Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/2020.coling-main.609","DOI":"10.18653\/v1\/2020.coling-main.609"},{"key":"9583_CR22","unstructured":"Gai, G. S. (1996). Inscriptions of the early Kadambas. Indian"},{"key":"9583_CR23","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1198\/004017007000000245","volume":"4","author":"A Genkin","year":"2007","unstructured":"Genkin, A., Lewis, D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 4, 9. https:\/\/doi.org\/10.1198\/004017007000000245.","journal-title":"Technometrics"},{"key":"9583_CR24","doi-asserted-by":"publisher","unstructured":"de\u00a0Gispert, A., Iglesias, G., & Byrne, B. (2015). Fast and accurate preordering for SMT using neural networks. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 1012\u20131017). Denver, Colorado: Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/N15-1105, https:\/\/www.aclweb.org\/anthology\/N15-1105","DOI":"10.3115\/v1\/N15-1105"},{"issue":"4","key":"9583_CR25","doi-asserted-by":"publisher","first-page":"215","DOI":"10.14257\/ijmue.2015.10.4.21","volume":"10","author":"ND Gitari","year":"2015","unstructured":"Gitari, N. D., Zuping, Z., Damien, H., & Long, J. (2015). A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering, 10(4), 215\u2013230.","journal-title":"International Journal of Multimedia and Ubiquitous Engineering"},{"key":"9583_CR26","doi-asserted-by":"publisher","unstructured":"Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In: KDD \u201904, Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 168-177). New York, NY, USA: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/1014052.1014073","DOI":"10.1145\/1014052.1014073"},{"key":"9583_CR27","doi-asserted-by":"publisher","unstructured":"Jiang, Q., Chen, L., Xu, R., Ao, X., & Yang, M. (2019). A challenge dataset and effective models for aspect-based sentiment analysis. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 6279\u20136284). Hong Kong, China: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/D19-1654, https:\/\/www.aclweb.org\/anthology\/D19-1654","DOI":"10.18653\/v1\/D19-1654"},{"key":"9583_CR28","doi-asserted-by":"publisher","unstructured":"Jin, S., & Pedersen, T. (2018). Duluth UROP at SemEval-2018 task 2: Multilingual emoji prediction with ensemble learning and oversampling. In: Proceedings of The 12th international workshop on semantic evaluation (pp. 482\u2013485). New Orleans, Louisiana: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/S18-1077, https:\/\/www.aclweb.org\/anthology\/S18-1077","DOI":"10.18653\/v1\/S18-1077"},{"key":"9583_CR29","doi-asserted-by":"crossref","unstructured":"Jose, N., Chakravarthi, B. R., Suryawanshi, S., Sherly, E., & McCrae, JP. (2020). A survey of current datasets for code-switching research. In: 2020 6th International conference on advanced computing & communication systems (ICACCS)","DOI":"10.1109\/ICACCS48705.2020.9074205"},{"key":"9583_CR30","unstructured":"Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the OMG! In: Fifth international AAAI conference on weblogs and social media. Citeseer"},{"issue":"1","key":"9583_CR31","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1177\/001316447003000105","volume":"30","author":"K Krippendorff","year":"1970","unstructured":"Krippendorff, K. (1970). Estimating the reliability, systematic error and random error of interval data. Educational and Psychological Measurement, 30(1), 61\u201370. https:\/\/doi.org\/10.1177\/001316447003000105","journal-title":"Educational and Psychological Measurement"},{"key":"9583_CR32","doi-asserted-by":"crossref","unstructured":"Krishna, P. V., Misra, S., Joshi, D., & Obaidat, M. S. (2013). Learning automata based sentiment analysis for recommender system on cloud. In 2013 International conference on computer (pp. 1\u20135). IEEE: Information and Telecommunication Systems (CITS).","DOI":"10.1109\/CITS.2013.6705715"},{"key":"9583_CR33","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511486876","volume-title":"The Dravidian Languages","author":"B Krishnamurti","year":"2003","unstructured":"Krishnamurti, B. (2003). The Dravidian Languages. Cambridge: Cambridge University Press."},{"key":"9583_CR34","doi-asserted-by":"crossref","unstructured":"Kumar, B. S., Thenmozhi, D., & Kayalvizhi, S. (2020). Tamil paraphrase detection using encoder-decoder neural networks. In: International conference on computational intelligence in data science (pp. 30\u201342). Springer","DOI":"10.1007\/978-3-030-63467-4_3"},{"key":"9583_CR35","unstructured":"Kumar, R., Ojha, AK., Malmasi, S., & Zampieri, M. (2018). Benchmarking aggression identification in social media. In: Proceedings of the first workshop on trolling, aggression and cyberbullying (TRAC-2018) (pp. 1\u201311). Santa Fe, New Mexico, USA: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/W18-4401"},{"key":"9583_CR36","unstructured":"Lample, G., & Conneau, A. (2019). Cross-lingual language model pretraining. arXiv:1901.07291"},{"key":"9583_CR37","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:190911942"},{"key":"9583_CR38","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692"},{"key":"9583_CR39","unstructured":"M\u00e6hlum, P., Barnes, J., \u00d8vrelid, L., & Velldal, E. (2019). Annotating evaluative sentences for sentiment analysis: A dataset for Norwegian. In: Proceedings of the 22nd Nordic conference on computational linguistics (pp. 121\u2013130). Link\u00f6ping University Electronic Press, Turku, Finland. https:\/\/www.aclweb.org\/anthology\/W19-6113"},{"key":"9583_CR40","unstructured":"Mahadevan, I. (2003). Early tamil epigraphy. From the earliest times to the sixth century ad"},{"key":"9583_CR41","doi-asserted-by":"publisher","unstructured":"Mandl, T., Modha, S., Kumar, M. A., & Chakravarthi, B. R. (2020). Overview of the HASOC track at FIRE 2020: Hate speech and offensive language identification in Tamil, Malayalam, Hindi, English and German. Forum for information retrieval evaluation (pp. 29\u201332). New York, NY, USA, FIRE: Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3441501.3441517","DOI":"10.1145\/3441501.3441517"},{"key":"9583_CR42","doi-asserted-by":"crossref","unstructured":"Musto, C., de\u00a0Gemmis, M., Semeraro, G., & Lops, P. (2017). A multi-criteria recommender system exploiting aspect-based sentiment analysis of users\u2019 reviews. In: Proceedings of the eleventh ACM conference on recommender systems (pp 321\u2013325)","DOI":"10.1145\/3109859.3109905"},{"key":"9583_CR43","unstructured":"Ng, AY., & Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In: Advances in neural information processing systems (pp. 841\u2013848)"},{"key":"9583_CR44","unstructured":"Nongmeikapam, K., Kumar, W., & Singh, M. P. (2017). Exploring an efficient handwritten Manipuri meetei-mayek character recognition using gradient feature extractor and cosine distance based multiclass k-nearest neighbor classifier. In: Proceedings of the 14th international conference on natural language processing (ICON-2017) (pp. 328\u2013337). Kolkata, India: NLP Association of India. https:\/\/www.aclweb.org\/anthology\/W17-7541"},{"key":"9583_CR45","doi-asserted-by":"publisher","unstructured":"Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04) (pp. 271\u2013278). Barcelona, Spain. https:\/\/doi.org\/10.3115\/1218955.1218990, https:\/\/www.aclweb.org\/anthology\/P04-1035","DOI":"10.3115\/1218955.1218990"},{"issue":"2","key":"9583_CR46","doi-asserted-by":"publisher","first-page":"154","DOI":"10.4040\/jkan.2013.43.2.154","volume":"43","author":"H Park","year":"2013","unstructured":"Park, H. (2013). An introduction to logistic regression: From basic concepts to interpretation with particular attention to nursing domain. Journal of Korean Academy of Nursing, 43(2), 154\u2013164.","journal-title":"Journal of Korean Academy of Nursing"},{"key":"9583_CR47","doi-asserted-by":"crossref","unstructured":"Patwa, P., Aguilar, G., Kar, S., Pandey, S., PYKL, S., Gamb\u00e4ck, B., Chakraborty, T., Solorio, T., & Das, A. (2020). Semeval-2020 task 9: Overview of sentiment analysis of code-mixed tweets. In: Proceedings of the 14th International workshop on semantic evaluation (SemEval-2020). Barcelona, Spain: Association for Computational Linguistics","DOI":"10.18653\/v1\/2020.semeval-1.100"},{"key":"9583_CR48","volume-title":"A Primer of Tamil Literature","author":"MP Pillai","year":"1904","unstructured":"Pillai, M. P. (1904). A Primer of Tamil Literature. Chennai: Ananda Press."},{"key":"9583_CR49","doi-asserted-by":"publisher","unstructured":"Pratapa, A., Bhat, G., Choudhury, M., Sitaram, S., Dandapat, S., & Bali, K. (2018). Language modeling for code-mixing: The role of linguistic theory based synthetic data. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Vol. 1: long papers, pp. 1543\u20131553). Association for Computational Linguistics, Melbourne, Australia. https:\/\/doi.org\/10.18653\/v1\/P18-1143, https:\/\/www.aclweb.org\/anthology\/P18-1143","DOI":"10.18653\/v1\/P18-1143"},{"key":"9583_CR50","unstructured":"Rani, P., Suryawanshi, S., Goswami, K., Chakravarthi, B. R., Fransen, T., & McCrae, J. P. (2020). A comparative study of different state-of-the-art hate speech detection methods for Hindi-English code-mixed data. In: Proceedings of the second workshop on trolling, aggression and cyberbullying. Marseille, France: European Language Resources Association (ELRA)"},{"key":"9583_CR51","doi-asserted-by":"publisher","unstructured":"Ranjan, P., Raja, B., Priyadharshini, R., & Balabantaray, R. C. (2016). A comparative study on code-mixed data of Indian social media vs formal text. In: 2016 2nd International conference on contemporary computing and informatics (IC3I) (pp. 608\u2013611). https:\/\/doi.org\/10.1109\/IC3I.2016.7918035","DOI":"10.1109\/IC3I.2016.7918035"},{"key":"9583_CR52","unstructured":"Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., & Gribov, A. (2018). RuSentiment: An enriched sentiment analysis dataset for social media in Russian. In: Proceedings of the 27th international conference on computational linguistics (pp. 755\u2013763). Santa Fe, New Mexico, USA: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/C18-1064"},{"key":"9583_CR53","doi-asserted-by":"crossref","unstructured":"Sakuntharaj, R., & Mahesan, S. (2016). A novel hybrid approach to detect and correct spelling in Tamil text. In: 2016 IEEE international conference on information and automation for sustainability (ICIAfS) (pp. 1\u20136). IEEE","DOI":"10.1109\/ICIAFS.2016.7946522"},{"key":"9583_CR54","doi-asserted-by":"crossref","unstructured":"Sakuntharaj, R., & Mahesan, S. (2017). Use of a novel hash-table for speeding-up suggestions for misspelt Tamil words. In: 2017 IEEE international conference on industrial and information systems (ICIIS) (pp. 1\u20135). IEEE","DOI":"10.1109\/ICIINFS.2017.8300346"},{"key":"9583_CR55","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780195099843.001.0001","volume-title":"Indian Epigraphy: A Guide to the Study of Inscriptions in Sanskrit, Prakrit, and the Other Indo-Aryan Languages","author":"R Salomon","year":"1998","unstructured":"Salomon, R. (1998). Indian Epigraphy: A Guide to the Study of Inscriptions in Sanskrit, Prakrit, and the Other Indo-Aryan Languages. Oxford: Oxford University Press."},{"key":"9583_CR56","unstructured":"Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:191001108"},{"key":"9583_CR57","unstructured":"Sekhar, AC. (1951). [Evolution of Malayalam]. Bulletin of the Deccan College Research Institute 12(1\/2):1\u2013216, http:\/\/www.jstor.org\/stable\/42929457"},{"key":"9583_CR58","doi-asserted-by":"publisher","unstructured":"Severyn, A., Moschitti, A., Uryupina, O., Plank, B., & Filippova, K. (2014). Opinion mining on YouTube. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Vol. 1: long papers, pp 1252\u20131261). Baltimore, Maryland: Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/P14-1118, https:\/\/www.aclweb.org\/anthology\/P14-1118","DOI":"10.3115\/v1\/P14-1118"},{"issue":"1","key":"9583_CR59","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s41133-020-00039-7","volume":"5","author":"K Shah","year":"2020","unstructured":"Shah, K., Patel, H., Sanghvi, D., & Shah, M. (2020). A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Research, 5(1), 1\u201316.","journal-title":"Augmented Human Research"},{"key":"9583_CR60","doi-asserted-by":"crossref","unstructured":"Shalini, K., Ganesh, H. B., Kumar, MA., & Soman, K. P. (2018). Sentiment analysis for code-mixed Indian social media text with distributed representation. In: 2018 International conference on advances in computing, communications and informatics (ICACCI) (pp. 1126\u20131131)","DOI":"10.1109\/ICACCI.2018.8554835"},{"key":"9583_CR61","volume-title":"Keeladi: An Urban Settlement of Sangam Age on the Banks of River Vaigai","author":"R Sivanantham","year":"2019","unstructured":"Sivanantham, R., & Seran, M. (2019). Keeladi: An Urban Settlement of Sangam Age on the Banks of River Vaigai. Chennai, India: Department of Archaeology, Government of Tamil Nadu."},{"key":"9583_CR62","doi-asserted-by":"publisher","unstructured":"Sowmya Lakshmi, BS., & Shambhavi, B. R. (2017). An automatic language identification system for code-mixed english-kannada social media text. In: 2017 2nd International conference on computational systems and information technology for sustainable solution (CSITSS) (pp. 1\u20135). https:\/\/doi.org\/10.1109\/CSITSS.2017.8447784","DOI":"10.1109\/CSITSS.2017.8447784"},{"issue":"16","key":"9583_CR63","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1515\/ijsl.1978.16.109","volume":"1978","author":"SN Sridhar","year":"1978","unstructured":"Sridhar, S. N. (1978). On the functions of code-mixing in Kannada. International Journal of the Sociology of Language, 1978(16), 109\u2013118.","journal-title":"International Journal of the Sociology of Language"},{"issue":"4","key":"9583_CR64","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1037\/h0081105","volume":"34","author":"SN Sridhar","year":"1980","unstructured":"Sridhar, S. N., & Sridhar, K. K. (1980). The syntax and psycholinguistics of bilingual code mixing. Canadian Journal of Psychology\/Revue canadienne de psychologie, 34(4), 407.","journal-title":"Canadian Journal of Psychology\/Revue canadienne de psychologie"},{"key":"9583_CR65","first-page":"317","volume":"292","author":"B Swamy","year":"1975","unstructured":"Swamy, B. (1975). The date of Tolkappiyam: A retrospect. Annals of Oriental Research (Madras), Silver Jubilee, 292, 317.","journal-title":"Annals of Oriental Research (Madras), Silver Jubilee"},{"key":"9583_CR66","doi-asserted-by":"crossref","DOI":"10.1163\/9789004658608","volume-title":"Tamil Love Poetry and Poetics","author":"T Takahashi","year":"1995","unstructured":"Takahashi, T. (1995). Tamil Love Poetry and Poetics (Vol. 9). New York: Brill."},{"key":"9583_CR67","first-page":"13","volume":"5","author":"KP Thamburaj","year":"2015","unstructured":"Thamburaj, K. P., & Rengganathan, V. (2015). A critical study of spm Tamil literature exam paper. Asian Journal of Assessment in Teaching and Learning, 5, 13\u201324.","journal-title":"Asian Journal of Assessment in Teaching and Learning"},{"key":"9583_CR68","doi-asserted-by":"crossref","unstructured":"Thamburaj, K. P., Arumugum, L., & Samuel, SJ. (2015). An analysis on keyboard writing skills in online learning. In: 2015 International symposium on technology management and emerging technologies (ISTMET) (pp. 373\u2013377). IEEE","DOI":"10.1109\/ISTMET.2015.7359062"},{"key":"9583_CR69","doi-asserted-by":"publisher","unstructured":"Thavareesan, S., & Mahesan, S. (2019). Sentiment analysis in Tamil texts: A study on machine learning techniques and feature representation. In: 2019 14th Conference on industrial and information systems (ICIIS) (pp. 320\u2013325). https:\/\/doi.org\/10.1109\/ICIIS47346.2019.9063341","DOI":"10.1109\/ICIIS47346.2019.9063341"},{"key":"9583_CR70","doi-asserted-by":"publisher","unstructured":"Thavareesan, S., & Mahesan, S. (2020a). Sentiment lexicon expansion using Word2vec and fastText for sentiment prediction in Tamil texts. In: 2020 Moratuwa engineering research conference (MERCon) (pp. 272\u2013276,).https:\/\/doi.org\/10.1109\/MERCon50084.2020.9185369","DOI":"10.1109\/MERCon50084.2020.9185369"},{"key":"9583_CR71","doi-asserted-by":"publisher","unstructured":"Thavareesan, S., & Mahesan, S. (2020b). Word embedding-based Part of Speech tagging in Tamil texts. In: 2020 IEEE 15th International conference on industrial and information systems (ICIIS) (pp. 478\u2013482). https:\/\/doi.org\/10.1109\/ICIIS51140.2020.9342640","DOI":"10.1109\/ICIIS51140.2020.9342640"},{"issue":"10","key":"9583_CR72","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s12046-018-0942-7","volume":"43","author":"D Thenmozhi","year":"2018","unstructured":"Thenmozhi, D., & Aravindan, C. (2018). Ontology-based Tamil-English cross-lingual information retrieval system. S\u0101dhan\u0101, 43(10), 1\u201314.","journal-title":"S\u0101dhan\u0101"},{"key":"9583_CR73","unstructured":"Thottingal, S. (2019). Finite state transducer based morphology analysis for Malayalam language. In: Proceedings of the 2nd workshop on technologies for MT of low resource languages. European Association for Machine Translation (pp. 1\u20135). Dublin, Ireland. https:\/\/www.aclweb.org\/anthology\/W19-6801"},{"key":"9583_CR74","doi-asserted-by":"publisher","unstructured":"Tian, Y., Galery, T., Dulcinati, G., Molimpakis, E., & Sun, C. (2017). Facebook sentiment: Reactions and emojis. In: Proceedings of the fifth international workshop on natural language processing for social media (pp. 11\u201316). Association for Computational Linguistics, Valencia, Spain, https:\/\/doi.org\/10.18653\/v1\/W17-1102, https:\/\/www.aclweb.org\/anthology\/W17-1102","DOI":"10.18653\/v1\/W17-1102"},{"key":"9583_CR75","first-page":"109","volume-title":"Development of Prototype Morphological Analyzer for he South Indian Language of Kannada","author":"TN Vikram","year":"2007","unstructured":"Vikram, T. N., & Urs, S. R. (2007). Development of Prototype Morphological Analyzer for he South Indian Language of Kannada (pp. 109\u2013116). Berlin: Springer."},{"key":"9583_CR76","doi-asserted-by":"publisher","unstructured":"Vyas, Y., Gella, S., Sharma, J., Bali, K., & Choudhury, M. (2014). POS tagging of English-Hindi code-mixed social media content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 974\u2013979). Doha, Qatar: Association for Computational Linguistics. https:\/\/doi.org\/10.3115\/v1\/D14-1105, https:\/\/www.aclweb.org\/anthology\/D14-1105","DOI":"10.3115\/v1\/D14-1105"},{"issue":"2","key":"9583_CR77","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1007\/s10579-005-7880-9","volume":"39","author":"J Wiebe","year":"2005","unstructured":"Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2), 165\u2013210. https:\/\/doi.org\/10.1007\/s10579-005-7880-9","journal-title":"Language Resources and Evaluation"},{"key":"9583_CR78","doi-asserted-by":"crossref","unstructured":"Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 347\u2013354). Vancouver, British Columbia, Canada: Association for Computational Linguistics. https:\/\/www.aclweb.org\/anthology\/H05-1044","DOI":"10.3115\/1220575.1220619"},{"key":"9583_CR79","doi-asserted-by":"publisher","unstructured":"Winata, G. I., Lin, Z., & Fung, P. (2019). Learning multilingual meta-embeddings for code-switching named entity recognition. In: Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019) (pp 181\u2013186). Florence, Italy: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W19-4320, https:\/\/www.aclweb.org\/anthology\/W19-4320","DOI":"10.18653\/v1\/W19-4320"},{"key":"9583_CR80","doi-asserted-by":"publisher","unstructured":"Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019). Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies (Vol. 1, pp. 1415\u20131420) (long and short papers). Minneapolis, Minnesota: Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/N19-1144, https:\/\/www.aclweb.org\/anthology\/N19-1144","DOI":"10.18653\/v1\/N19-1144"},{"key":"9583_CR81","doi-asserted-by":"crossref","unstructured":"Zampieri, M., Nakov, P., Rosenthal, S., Atanasova, P., Karadzhov, G., Mubarak, H., Derczynski, L., Pitenis, Z., & \u00c7\u00f6ltekin, c. (2020). SemEval-2020 Task 12: Multilingual offensive language identification in social media (OffensEval 2020). In: Proceedings of SemEval","DOI":"10.18653\/v1\/2020.semeval-1.188"},{"key":"9583_CR82","first-page":"345","volume":"59","author":"KV Zvelebil","year":"1991","unstructured":"Zvelebil, K. V. (1991). Comments on the Tolkappiyam theory of literature. Arch\u00edv Orient\u00e1ln\u00ed, 59, 345\u2013359.","journal-title":"Arch\u00edv Orient\u00e1ln\u00ed"}],"container-title":["Language Resources and Evaluation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-022-09583-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10579-022-09583-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10579-022-09583-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T23:59:39Z","timestamp":1700179179000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10579-022-09583-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,4]]},"references-count":83,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["9583"],"URL":"https:\/\/doi.org\/10.1007\/s10579-022-09583-7","relation":{},"ISSN":["1574-020X","1574-0218"],"issn-type":[{"value":"1574-020X","type":"print"},{"value":"1574-0218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,4]]},"assertion":[{"value":"24 January 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 February 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}