{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T16:50:22Z","timestamp":1768409422556,"version":"3.49.0"},"reference-count":98,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,11,25]],"date-time":"2022-11-25T00:00:00Z","timestamp":1669334400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,1,31]]},"abstract":"<jats:p>\n            Due to the lack of a large annotated corpus, many resource-poor Indian languages struggle to reap the benefits of recent deep feature representations in\n            <jats:bold>Natural Language Processing (NLP)<\/jats:bold>\n            . Moreover, adopting existing language models trained on large English corpora for Indian languages is often limited by data availability, rich morphological variation, syntax, and semantic differences. In this paper, we explore the traditional to recent efficient representations to overcome the challenges of a low resource language, Telugu. In particular, our main objective is to mitigate the low-resource problem for Telugu. Overall, we present several contributions to a resource-poor language viz. Telugu. (i) a large annotated data (35,142 sentences in each task) for multiple NLP tasks such as sentiment analysis, emotion identification, hate-speech detection, and sarcasm detection, (ii) we create different lexicons for sentiment, emotion, and hate-speech for improving the efficiency of the models, (iii) pretrained word and sentence embeddings, and (iv) different pretrained language models for Telugu such as\n            <jats:italic>ELMo-Te<\/jats:italic>\n            ,\n            <jats:italic>BERT-Te<\/jats:italic>\n            ,\n            <jats:italic>RoBERTa-Te<\/jats:italic>\n            ,\n            <jats:italic>ALBERT-Te<\/jats:italic>\n            , and\n            <jats:italic>DistilBERT-Te<\/jats:italic>\n            on a large Telugu corpus consisting of 8,015,588 sentences (1,637,408 sentences from Telugu Wikipedia and 6,378,180 sentences crawled from different Telugu websites). Further, we show that these representations significantly improve the performance of four NLP tasks and present the benchmark results for Telugu. We argue that our pretrained embeddings are competitive or better than the existing multilingual pretrained models:\n            <jats:italic>mBERT<\/jats:italic>\n            ,\n            <jats:italic>XLM-R<\/jats:italic>\n            , and\n            <jats:italic>IndicBERT<\/jats:italic>\n            . Lastly, the fine-tuning of pretrained models show higher performance than linear probing results on four NLP tasks with the following F1-scores: Sentiment (68.72), Emotion (58.04), Hate-Speech (64.27), and Sarcasm (77.93). We also experiment on publicly available Telugu datasets (Named Entity Recognition, Article Genre Classification, and Sentiment Analysis) and find that our Telugu pretrained language models (\n            <jats:italic>BERT-Te<\/jats:italic>\n            and\n            <jats:italic>RoBERTa-Te<\/jats:italic>\n            ) outperform the state-of-the-art system except for the sentiment task. We open-source our corpus, four different datasets, lexicons, embeddings, and code \u00a0https:\/\/github.com\/Cha14ran\/DREAM-T. The pretrained Transformer models for Telugu are available at \u00a0https:\/\/huggingface.co\/ltrctelugu.\n          <\/jats:p>","DOI":"10.1145\/3531535","type":"journal-article","created":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T11:41:22Z","timestamp":1651232482000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Am I a Resource-Poor Language? Data Sets, Embeddings, Models and Analysis for four different NLP Tasks in Telugu Language"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1184-640X","authenticated-orcid":false,"given":"Mounika","family":"Marreddy","sequence":"first","affiliation":[{"name":"IIITH, Hyderabad, Telengana, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5975-622X","authenticated-orcid":false,"given":"Subba Reddy","family":"Oota","sequence":"additional","affiliation":[{"name":"IIITH, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8398-576X","authenticated-orcid":false,"given":"Lakshmi Sireesha","family":"Vakada","sequence":"additional","affiliation":[{"name":"IIITH, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4573-1096","authenticated-orcid":false,"given":"Venkata Charan","family":"Chinni","sequence":"additional","affiliation":[{"name":"IIITH, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0171-0816","authenticated-orcid":false,"given":"Radhika","family":"Mamidi","sequence":"additional","affiliation":[{"name":"IIITH, Hyderabad, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,25]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1067"},{"key":"e_1_3_3_3_2","article-title":"Modelling context with user embeddings for sarcasm detection in social media","author":"Amir Silvio","year":"2016","unstructured":"Silvio Amir, Byron C. Wallace, Hao Lyu, Paula Carvalho, and M\u00e1rio J. Silva. 2016. Modelling context with user embeddings for sarcasm detection in social media. arXiv preprint arXiv:1607.00976 (2016).","journal-title":"arXiv preprint arXiv:1607.00976"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-1909-8_11"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-2609"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11164-8_42"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118726"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1002\/poi3.85"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1140\/epjds\/s13688-016-0072-6"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/1873781.1873802"},{"key":"e_1_3_3_13_2","article-title":"Lifelong learning for sentiment classification","author":"Chen Zhiyuan","year":"2018","unstructured":"Zhiyuan Chen, Nianzu Ma, and Bing Liu. 2018. Lifelong learning for sentiment classification. arXiv preprint arXiv:1801.02808 (2018).","journal-title":"arXiv preprint arXiv:1801.02808"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/1613715.1613816"},{"key":"e_1_3_3_16_2","article-title":"Sentiment analysis of code-mixed languages leveraging resource rich languages","author":"Choudhary Nurendra","year":"2018","unstructured":"Nurendra Choudhary, Rajat Singh, Ishita Bindlish, and Manish Shrivastava. 2018. Sentiment analysis of code-mixed languages leveraging resource rich languages. arXiv preprint arXiv:1804.00806 (2018).","journal-title":"arXiv preprint arXiv:1804.00806"},{"key":"e_1_3_3_17_2","article-title":"Empirical evaluation of gated recurrent neural networks on sequence modeling","author":"Chung Junyoung","year":"2014","unstructured":"Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).","journal-title":"arXiv preprint arXiv:1412.3555"},{"key":"e_1_3_3_18_2","doi-asserted-by":"crossref","unstructured":"Trevor Cohn and Phil Blunsom. 2005. Semantic role labelling with tree conditional random fields. (2005).","DOI":"10.3115\/1706543.1706573"},{"key":"e_1_3_3_19_2","first-page":"7057","volume-title":"Advances in Neural Information Processing Systems","author":"Conneau Alexis","year":"2019","unstructured":"Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems. 7057\u20137067."},{"key":"e_1_3_3_20_2","first-page":"56","volume-title":"Proceedings of the Eighth Workshop on Asian Language Resources","author":"Das Amitava","year":"2010","unstructured":"Amitava Das and Sivaji Bandyopadhyay. 2010. SentiWordNet for Indian languages. In Proceedings of the Eighth Workshop on Asian Language Resources. 56\u201363."},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/1870568.1870582"},{"key":"e_1_3_3_22_2","article-title":"Multilingual BERT -r","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Multilingual BERT -r. https:\/\/github.com\/google-research\/bert\/blob\/master\/multilingual.md.","journal-title":"https:\/\/github.com\/google-research\/bert\/blob\/master\/multilingual.md"},{"key":"e_1_3_3_23_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1080\/02699939208411068"},{"key":"e_1_3_3_25_2","first-page":"417","volume-title":"LREC","author":"Esuli Andrea","year":"2006","unstructured":"Andrea Esuli and Fabrizio Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion mining. In LREC, Vol. 6. Citeseer, 417\u2013422."},{"key":"e_1_3_3_26_2","article-title":"Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm","author":"Felbo Bjarke","year":"2017","unstructured":"Bjarke Felbo, Alan Mislove, Anders S\u00f8gaard, Iyad Rahwan, and Sune Lehmann. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524 (2017).","journal-title":"arXiv preprint arXiv:1708.00524"},{"key":"e_1_3_3_27_2","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Gangula Rama Rohit Reddy","year":"2018","unstructured":"Rama Rohit Reddy Gangula and Radhika Mamidi. 2018. Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)."},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.14257\/ijmue.2015.10.4.21"},{"key":"e_1_3_3_29_2","article-title":"Improved word sense disambiguation using pre-trained contextualized word representations","author":"Hadiwinoto Christian","year":"2019","unstructured":"Christian Hadiwinoto, Hwee Tou Ng, and Wee Chung Gan. 2019. Improved word sense disambiguation using pre-trained contextualized word representations. arXiv preprint arXiv:1910.00194 (2019).","journal-title":"arXiv preprint arXiv:1910.00194"},{"key":"e_1_3_3_30_2","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Heinzerling Benjamin","year":"2018","unstructured":"Benjamin Heinzerling and Michael Strube. 2018. BPEmb: Tokenization-free pre-trained subword embeddings in 275 languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan."},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014073"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3124420"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1104"},{"key":"e_1_3_3_35_2","article-title":"Bag of tricks for efficient text classification","author":"Joulin Armand","year":"2016","unstructured":"Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).","journal-title":"arXiv preprint arXiv:1607.01759"},{"key":"e_1_3_3_36_2","first-page":"4948","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings","author":"Kakwani Divyanshu","year":"2020","unstructured":"Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, N. C. Gokul, Avik Bhattacharyya, Mitesh M. Khapra, and Pratyush Kumar. 2020. iNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 4948\u20134961."},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1062"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.5395\/rde.2014.39.1.74"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220555"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_3_41_2","first-page":"3294","volume-title":"Advances in Neural Information Processing Systems","author":"Kiros Ryan","year":"2015","unstructured":"Ryan Kiros, Yukun Zhu, Russ R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems. 3294\u20133302."},{"key":"e_1_3_3_42_2","article-title":"Predictive embeddings for hate speech detection on Twitter","author":"Kshirsagar Rohan","year":"2018","unstructured":"Rohan Kshirsagar, Tyus Cukuvac, Kathleen McKeown, and Susan McGregor. 2018. Predictive embeddings for hate speech detection on Twitter. arXiv preprint arXiv:1809.10644 (2018).","journal-title":"arXiv preprint arXiv:1809.10644"},{"key":"e_1_3_3_43_2","article-title":"Cross-lingual language model pretraining","author":"Lample Guillaume","year":"2019","unstructured":"Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291 (2019).","journal-title":"arXiv preprint arXiv:1901.07291"},{"key":"e_1_3_3_44_2","article-title":"ALBERT: A lite BERT for self-supervised learning of language representations","author":"Lan Zhenzhong","year":"2019","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).","journal-title":"arXiv preprint arXiv:1909.11942"},{"key":"e_1_3_3_45_2","first-page":"1188","volume-title":"International Conference on Machine Learning","author":"Le Quoc","year":"2014","unstructured":"Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188\u20131196."},{"key":"e_1_3_3_46_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN52387.2021.9534382"},{"key":"e_1_3_3_48_2","first-page":"6294","volume-title":"Advances in Neural Information Processing Systems","author":"McCann Bryan","year":"2017","unstructured":"Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. 6294\u20136305."},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.5555\/2390948.2391015"},{"key":"e_1_3_3_50_2","first-page":"3111","volume-title":"Advances in Neural Information Processing Systems","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111\u20133119."},{"key":"e_1_3_3_51_2","article-title":"NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets","author":"Mohammad Saif M.","year":"2013","unstructured":"Saif M. Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242 (2013).","journal-title":"arXiv preprint arXiv:1308.6242"},{"key":"e_1_3_3_52_2","first-page":"26","volume-title":"Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text","author":"Mohammad Saif M.","year":"2010","unstructured":"Saif M. Mohammad and Peter D. Turney. 2010. Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Association for Computational Linguistics, 26\u201334."},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8640.2012.00460.x"},{"key":"e_1_3_3_54_2","unstructured":"Karo Moilanen and Stephen Pulman. 2007. Sentiment Composition. (2007)."},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-5408"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-64283-3_26"},{"key":"e_1_3_3_57_2","unstructured":"Ankita Nandy. Beyond Words: Pictograms for Indian Languages. ([n.d.])."},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2883062"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219855"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-3014"},{"key":"e_1_3_3_61_2","article-title":"Enrichment of OntoSenseNet: Adding a sense-annotated Telugu lexicon","author":"Parupalli Sreekavitha","year":"2018","unstructured":"Sreekavitha Parupalli and Navjyoti Singh. 2018. Enrichment of OntoSenseNet: Adding a sense-annotated Telugu lexicon. arXiv preprint arXiv:1804.02186 (2018).","journal-title":"arXiv preprint arXiv:1804.02186"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_3_63_2","article-title":"Deep contextualized word representations","author":"Peters Matthew E.","year":"2018","unstructured":"Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018).","journal-title":"arXiv preprint arXiv:1802.05365"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1511\/2001.4.344"},{"key":"e_1_3_3_65_2","article-title":"Linguistically regularized LSTMs for sentiment classification","author":"Qian Qiao","year":"2016","unstructured":"Qiao Qian, Minlie Huang, Jinhao Lei, and Xiaoyan Zhu. 2016. Linguistically regularized LSTMs for sentiment classification. arXiv preprint arXiv:1611.03949 (2016).","journal-title":"arXiv preprint arXiv:1611.03949"},{"key":"e_1_3_3_66_2","first-page":"133","volume-title":"Proceedings of the First Instructional Conference on Machine Learning","volume":"242","author":"Ramos Juan","year":"2003","unstructured":"Juan Ramos et\u00a0al. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Vol. 242. Piscataway, NJ, 133\u2013142."},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.1140\/epjds\/s13688-016-0093-1"},{"key":"e_1_3_3_68_2","first-page":"704","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Riloff Ellen","year":"2013","unstructured":"Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 704\u2013714."},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1093\/beheco\/arn020"},{"key":"e_1_3_3_70_2","article-title":"DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter","author":"Sanh Victor","year":"2019","unstructured":"Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).","journal-title":"arXiv preprint arXiv:1910.01108"},{"key":"e_1_3_3_71_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1058"},{"key":"e_1_3_3_72_2","first-page":"1631","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1631\u20131642."},{"key":"e_1_3_3_73_2","doi-asserted-by":"publisher","DOI":"10.2753\/MIS0742-1222290408"},{"key":"e_1_3_3_74_2","volume-title":"7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7)","author":"Su\u00e1rez Pedro Javier Ortiz","year":"2019","unstructured":"Pedro Javier Ortiz Su\u00e1rez, Beno\u00eet Sagot, and Laurent Romary. 2019. Asynchronous pipeline for processing huge corpora on medium to low resource infrastructures. In 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7). Leibniz-Institut f\u00fcr Deutsche Sprache."},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220686"},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1167"},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.5555\/1599081.1599192"},{"key":"e_1_3_3_78_2","unstructured":"Madhuri Tummalapalli Manoj Chinnakotla and Radhika Mamidi. 2018. Towards better sentence classification for morphologically rich languages. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing ."},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/3194206.3194209"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143967"},{"key":"e_1_3_3_81_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-2013"},{"key":"e_1_3_3_82_2","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220619"},{"key":"e_1_3_3_83_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.repl4nlp-1.16"},{"key":"e_1_3_3_84_2","article-title":"Dynamic coattention networks for question answering","author":"Xiong Caiming","year":"2016","unstructured":"Caiming Xiong, Victor Zhong, and Richard Socher. 2016. Dynamic coattention networks for question answering. arXiv preprint arXiv:1611.01604 (2016).","journal-title":"arXiv preprint arXiv:1611.01604"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.3115\/1557769.1557809"},{"key":"e_1_3_3_86_2","first-page":"5754","volume-title":"Advances in Neural Information Processing Systems","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. 5754\u20135764."},{"key":"e_1_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-37256-1_89"},{"key":"e_1_3_3_88_2","doi-asserted-by":"publisher","DOI":"10.5555\/1870658.1870760"},{"key":"e_1_3_3_89_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1128"},{"key":"e_1_3_3_90_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-010-5221-8"},{"key":"e_1_3_3_91_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1056"},{"key":"e_1_3_3_92_2","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1253"},{"key":"e_1_3_3_93_2","first-page":"2449","volume-title":"Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers","author":"Zhang Meishan","year":"2016","unstructured":"Meishan Zhang, Yue Zhang, and Guohong Fu. 2016. Tweet sarcasm detection using deep neural network. In Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers. 2449\u20132460."},{"key":"e_1_3_3_94_2","first-page":"649","article-title":"Character-level convolutional networks for text classification","volume":"28","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems 28 (2015), 649\u2013657.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_95_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93417-4_48"},{"key":"e_1_3_3_96_2","volume-title":"Twenty-Fourth International Joint Conference on Artificial Intelligence","author":"Zhao Han","year":"2015","unstructured":"Han Zhao, Zhengdong Lu, and Pascal Poupart. 2015. Self-adaptive hierarchical sentence model. In Twenty-Fourth International Joint Conference on Artificial Intelligence."},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1061"},{"key":"e_1_3_3_98_2","article-title":"Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling","author":"Zhou Peng","year":"2016","unstructured":"Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016).","journal-title":"arXiv preprint arXiv:1611.06639"},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.11"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3531535","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3531535","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:22Z","timestamp":1750183762000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3531535"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,25]]},"references-count":98,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,31]]}},"alternative-id":["10.1145\/3531535"],"URL":"https:\/\/doi.org\/10.1145\/3531535","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,25]]},"assertion":[{"value":"2021-07-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-29","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}